I strongly, strongly recommend that you play with that code until you understand how it works and what it's doing before you try applying it to TI program tokenization. Smile
KermMartian wrote:
I strongly, strongly recommend that you play with that code until you understand how it works and what it's doing before you try applying it to TI program tokenization. Smile


I have only read it so far :S
And what have you gleaned from it so far? Do you understand the object-oriented approach it's using? Do you see the state machine that he (she?) is setting up to process the tokenization?
I'll get on that code later, sorry.


Code:

// tokens.cs
using System;
// The System.Collections namespace is made available:
using System.Collections;

// Declare the Tokens class:
public class Tokens : IEnumerable
{
   private string[] elements;

   Tokens(string source, char[] delimiters)
   {
      // Parse the string into tokens:
      elements = source.Split(delimiters);
   }

   // IEnumerable Interface Implementation:
   //   Declaration of the GetEnumerator() method
   //   required by IEnumerable
   public IEnumerator GetEnumerator()
   {
      return new TokenEnumerator(this);
   }

   // Inner class implements IEnumerator interface:
   private class TokenEnumerator : IEnumerator
   {
      private int position = -1;
      private Tokens t;

      public TokenEnumerator(Tokens t)
      {
         this.t = t;
      }

      // Declare the MoveNext method required by IEnumerator:
      public bool MoveNext()
      {
         if (position < t.elements.Length - 1)
         {
            position++;
            return true;
         }
         else
         {
            return false;
         }
      }

      // Declare the Reset method required by IEnumerator:
      public void Reset()
      {
         position = -1;
      }

      // Declare the Current property required by IEnumerator:
      public object Current
      {
         get
         {
            return t.elements[position];
         }
      }
   }

   // Test Tokens, TokenEnumerator

   static void Main()
   {
      // Testing Tokens by breaking the string into tokens:
      Tokens f = new Tokens("This is a string, ready to be tokenized.",
         new char[] {' ','-'});
      foreach (string item in f)
      {
         Console.WriteLine(item);
      }
      Console.ReadLine();
   }
}


Also found this. Python code time.
But you're still not realizing that the token process is independent of any delimiters. Smile TI-BASIC has no delimiters between its tokens, so the following code is useless:


Code:

   Tokens(string source, char[] delimiters)
   {
      // Parse the string into tokens:
      elements = source.Split(delimiters);
   }
KermMartian wrote:
But you're still not realizing that the token process is independent of any delimiters. Smile TI-BASIC has no delimiters between its tokens, so the following code is useless:


Code:

   Tokens(string source, char[] delimiters)
   {
      // Parse the string into tokens:
      elements = source.Split(delimiters);
   }



Yes, while running and testing that code I found out it could never be used.

I need to find a way of searching in the text, like a find command or a search or a match function.

This:


Code:
Disp "Disp "


Do you have anything in SourceCoder that makes text inside quotes never be a function?
No, because I don't consider there to be a point. There's no reason to not make the second Disp a Disp token rather than D-i-s-p; is saves four bytes, and doesn't break anything. Smile
Well, it could, if you're needing to do some substrings on a string that has, say, "Disp Input Other Stuff", and you want to be able to get the individual characters of the word.

Also, doesn't it save seven bytes? Lowercase letters are two-byte tokens, right?
merthsoft wrote:
Well, it could, if you're needing to do some substrings on a string that has, say, "Disp Input Other Stuff", and you want to be able to get the individual characters of the word.

Also, doesn't it save seven bytes? Lowercase letters are two-byte tokens, right?


Lowercase are indeed larger than uppercase.


Code:
Disp "Disp "


Do your compilers compile this as a program that displays a disp ?
I'm not sure what you're asking. They will create a program that displays "Disp ", yes. Just like if you made a program on your calc that had Disp "Disp ".
What SourceCoder will do, and what Tokens will do unless you modify the string, is create a program whose tokens are:

Code:
DE2ADE2A

That is,

Code:
Disp "Disp "

It will display the string "Disp ", but it will display the token for "Disp " not the characters for each letter. In Tokens, if you have

Code:
Disp "Disp\ "

It will tokenize it to:

Code:
DE2A44BBB8BBC3BBC0292A

Which will display the string "Disp " but has the actual characters in it.
Indeed. The only difference between SourceCoder and Tokens in that regard is that SourceCoder doesn't provide a facility to force the literals like that.
Well, it is weird indeed. Now, the tokenizer.

Should I use a find/search/match command?
ScoutDavid wrote:
Well, it is weird indeed. Now, the tokenizer.

Should I use a find/search/match command?
I don't think that would help, because find("Disp ") would preclude find("i"), you know?
ScoutDavid wrote:
Well, it is weird indeed.
How is it weird? What would you expect it to do?

ibid. wrote:
Should I use a find/search/match command?
Probably not, if you're suggesting what I think you are. Doing a find/replace through the entire document for each string would be a horribly slow way to go about this. And you'd probably have to go in size order of the strings. That way you don't convert the "i" in Disp when you actually need the token "Disp ".
Substring maximum matcher is the only option, right?
ScoutDavid wrote:
Substring maximum matcher is the only option, right?
It's not the only option, but it's the best option. Doing a reverse-length-ordered find/remove would work but take at least O(n^3) compared to O(n^2) for the maximal substring matcher.
Well, I am now trying something completely different!


Code:

Disp "A"


This is a BASIC program.


Code:
DE2A412A


This is the Tokenized code of it.


Code:
program=bytearray()
program.append(DE2A412A)


This will make a bytearray and store the token values in it.


I will not convert "Disp "A"", I will try something different this time, the actual .8xp file MAKER. I will try to make a .8xp file, I have the byte value of it but have no idea where to go now :S

http://tibasicdev.wikidot.com/68k:tokenization

I read this too Smile
The 68k tokenization information will likely not help you, because they can accept both plain-text and tokenized files, and tokenize on-the-fly. You have the "byte value" of what?
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
» Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8
» View previous topic :: View next topic  
Page 8 of 8
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement