TI-BASIC is freaking crazy. Beware, traveler.
TI-Toolkit has put some serious thought-effort into this; I'll summarize our thoughts, the rationale, and attribute insights to specific people where possible. This is a problem that’s plagued my mind every four months for the past two years, and I’m happy to have an opportunity to write it all down.
The first thing is using
our token sheet. They contain useful things for such an editor, like common keyboard-accessible ways of typing things, renderings in the TI-Font, elaborate token histories (“I have this calculator and OS, what tokens are supported and what do they look like?”- if you properly parse the sheets (and we have
example Python code for doing so)), and more. A means of defining translations is defined but not implemented, and the whole structure is
verified in CI; it has been well-engineered to support this use-case (I do not expect to make any backwards-incompatible changes to add any of the things I want to add). Furthermore, it contains a number of substantive corrections to the TokenIDE data used by said application and SourceCoder.
A JSON version of the data is available, with all the same information, and there is a
script for exporting a specific snapshot in time as a TokenIDE-compatible file (i.e. generate me a TokenIDE sheet for a CE, version 5.3.0).
Adriweb has a CSV which has details for all of the command arguments, though this is not tracked through history, for my understanding (and like most CSVs, it’s a mess <3). Some work needs to be done to merge this data in, but it will be done in a backwards-compatible way. Looks like I was confusing this with his JSON, distinct from the JSON mentioned above, which is a confluence of token data
and command data from various sources, see his remarks below.
The next thing is actually turning text into a token stream.
First, let’s examine what we
want:
- It should be predictable (i.e. have reasonable defaults for every circumstance).
- It should be interactive and configurable, for when the reasonable defaults do not describe what you want.
- It should be obvious at-a-glance what is going to be produced.
- We want something which is an involution; i.e. every program can be exported, imported, and exported again and you will receive the same program. This sounds easy, but I believe every single program editor today fails at this one. This means overriding the reasonable defaults on import.
I note the absence of backwards-compatibility with existing text formats. There are too many of them, and all of them suck. A good effort should be made to do something intelligent with them-- and indeed, many of them made their way into the tokens sheet as
variants, so properly supporting the token sheet would be an intelligent thing, but it is not a priority of mine.
Let’s learn from how the existing options with the widest adoption fail to meet these common-sense wants.
SourceCoder&TokenIDE maximally-munch tokens regardless of context (though there is a little bit of extra sauce to handle backslash-escaped tokens). This is a significant problem; it does not do what the user expects. Consider the case of an English-speaking program author writing menus, which is then downloaded by someone with their calculator language set to French. If the program contains
and in a menu, as it is likely to do, the French calculator will display a sentence which is entirely in English, except for one word in French. This is exacerbated by tokens like
LEFT and
RED existing only on certain calculators, and even further more by the fact that the token translations may now cause some lines of text to run off the screen- you need to know
all of the tokens and
all of their translations in order to know what your program will look like, but that’s pretty obviously the entire point of having tools like the editor. What’s more, if TI decides to add a new token in a future OS release, it might retroactively change what sequence of tokens a string is tokenized into. This is why having history-aware token sheets is important.
The TI-Planet Project Builder does something smarter: it maximally-munches while out of strings and minimally munches while in them. This breaks a different set of uses; TI allows you to execute some code stored in strings and if that code is minimally munched it will
ERR:SYNTAX. Furthermore,
Send( does
some amount of string interpolation for some reason (I think an intelligent editor should do intelligent things here), and minimally munching also breaks that.
The prevailing consensus is that tokens
<-> pure text is a lost cause if you want the plain text to match what you see on-calculator; there must be an additional layer of information to resolve these problems. There are two distinct reasons why someone would want a textual format for their tokens. The first way is to communicate with others; I think just exporting a list of token
displays or
accessibles (to use the attributes defined in the sheets) is the best way to do this. I much prefer this option because it is extremely obvious what is going on and reasonably backwards-compatible. Humans will figure out what tokens to use where and it looks pretty in a forum post, and it’s reasonable to expect a token editor to make a good attempt at parsing this (minimal munching in strings except when it’s obviously code and maximal munching in code counts as a good attempt in my book)
The second use of tokens-as-text requires a bit more subtlety and it is where round-trip accuracy is imperative. I would like to be able to use git to version my projects, and an IDE is singularly capable of handling both setting up the repository well, and saving and loading all files. commandblockguy and Tari independently proposed a
delimited format. I’m ambivalent on using this in general, but I think this is an excellent use-case for it. So long as all contributors are using the IDE, you get nice diffs and such. Our token sheets
require that
accessible names are unique, so I propose
U+200C-separated
accessible names as a good hybrid of human and machine readability for version control.
Token editors should operate on the tokenized form internally and convert to text only when necessary (i.e for displaying and exporting). This is backwards from what the existing software does but is the
most reasonable way to ensure round-trip integrity.
As for display, TokenIDE’s token-underlining feature is perhaps its greatest strength. Wavejumper, KG, and I discussed this in HCWP recently, actually, and had the idea that text strings which are being minimally munched but could be maximally munched could get a dotted underline, and users could right-click to switch between minimal and maximal munching
That’s
pretty much everything, though I did not provide
every reason why I like the system I described; please ask me to clarify things which are unclear. I removed a couple sections from initial drafts for brevity!