(Our sheets have been mentioned and discussed in plenty of places over the past year, but only recently I realized that they don't have a dedicated thread. So, here we are.)

The token sheets are XML files containing copious technical information about every token on the 8X calcs. The sheets were derived from and inspired by TokenIDE's sheets, but are intended to be a more generic reference tool for tokenization and other tasks.

Unlike programs written on a computer, which are made of characters, TI-BASIC programs are composed of tokens: indivisible chunks of bytes that correspond to the many commands you see on-calc. If your cursor "skips" over it in the editor, it's a token. Tokenization is the process of converting characters (which are easy to write on your computer) into tokens (which the calculator understands). Detokenization is the reverse process of rendering all the bytes in a TI-BASIC program as text people can read.

Both of these tasks require complete knowledge, across time and across languages you wish to support, of the set of tokens to prevent future token changes from breaking existing programs. Furthermore, tokenization requires additional care to deal with ambiguous and unintended parses. The sheets' replete token information and reference [de]tokenization implementations (discussed below) are thorough and carefully constructed to support a variety of use cases.

Every token's history is kept in one file, and changes like token addition, removal, and renaming are provided in an easily parsable way. This allows applications to use just one sheet while targeting anything from the earliest 82 to the latest CE. Each token tag includes:
  • A full linear history
  • An accessible detokenization name (how do people generally type this token on a keyboard?)
  • Other extant variants, which are names recognized by other tools
  • Display information that captures how the token is rendered on a calculator (both in TI-Font bytes and as Unicode approximations for convenience)

We highly encourage you to utilize the sheets for your next project, as we strive for the sheets to be a valuable and powerful standard reference for the 82/83/84-series calculators. We've provided several straightforward ways to use these sheets in your next project:
  • Clone the main branch directly (for easy scripting using the provided Python scripts, which we guarantee will provide a stable API even if the sheets change1).
  • Clone the built, which contains validated copies of the sheets, as well as the same data in JSON format. We also produce a TokenIDE-compatible sheet for the latest version of every calculator model. Adding this branch as a git submodule is recommended for using the sheets in any long-term project.
  • For Python projects which just need [de]tokenization, tivars_lib_py's tokenizer is extremely robust and has first-class support for these sheets.
  • For Rust projects, the titokens crate has a complete sheet parser and a working detokenizer but only has a half-baked tokenizer.

Errors, issues, and suggestions are more than welcome in this thread and can also be directed to GitHub or our Discord server. This project is nearly two years old, and we continue to include more information as it becomes available. Our next big project with the sheets is adding translations and developing a new sheet for font information; we would love your help!

This project had many contributors:
  • Every contributor to the original TokenIDE sheet (Kerm, merth, and tifreak were listed on the version we used).
  • iPhoenix and kg583, who populated the sheets and provided a reference implementation and core ideas for sheet layout (most of which have survived several revisions).
  • Tari, who helped us cleanly incorporate more data.
  • LogicalJoe, who uncovered and fixed many token history omissions.
  • The rest of the Toolkit crew (particularly Adriweb and womp), who helped spot errors and contributed to discussions that helped shape the final form of these sheets.

1This post marks sheet version 1.0; we are extremely unlikely to change the format again in a backward-incompatible way.
I extracted the translations for several languages (tool here, output here). Merging these into our sheet is “all” that needs to be done for, say, French-language support, save tracking how these translations have changed through history, deciding on accessible representations, and integrating these into our existing data. If you choose to pick up this task, let me know!

I am completely sick of doing manual tasks for every single TI-BASIC token. This, though, seems mostly automatable Smile
Oh also for C++ and JS/WASM users, since you may be using tivars_lib_cpp, don't worry, I plan to switch from my old token definitions CSV to this sheet here Smile (this also means the tiplanet PB will use this)
Quote:
For Rust projects, the titokens crate has a complete sheet parser and a working detokenizer but only has a half-baked tokenizer.
Uh oh, name collision! My implementation has both tokenization (including at compile-time) and detokenization, but currently consumes a Tokens-schema tokensheet though I'd like to switch it over to the toolkit XML format.

I also wish your crate had a repository URL in its metadata, because I don't know where to find it.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement