You stole my idea Very Happy Keep up the good work!
Me again!

This project is still being worked on, don't worry! I'm working on a detokenizer at the moment and progress is good, if a little slow.

So far, I have started a library that detokenizes strings of bytes given to it from the calculator. Tokens get dumped into a dictionary and tokens that are the first part of a double byte get marked as being "DOUBLE" - so my program has a means of detecting for double bytes and acting accordingly. Now, I am up to making a detokenzier function, and that's about it.

A long while ago, Kerm posted this pseudocode. I am aware of the method ord() 's function (finding the ASCII code point of a character) however, I am unsure of its function in this snippet of code, and also whether it will be needed in my detokenizer at all.


Code:
index = 0;
outString = ""
while index < len(byteString):
    thisByteVal = ord(byteString[index])
    if thisByteVal is a one-byte token:
        outString += oneByteTokens[thisByteVal]
    elif thisByteVal is the first byte of a two-byte token and index + 1 < len(byteString):
        nextByteVal = ord(byteString[index+1])
        outString += twoByteTokens[thisByteVal, nextByteVal]
        index += 1
    index += 1


More progress soon!
The short answer is that you have a two-dimensional array of token byte to token string definitions. Your first list is for one-byte tokens, translating the one-byte token value to a string. List elements are indexed by a number, and the byte value of a token byte can be 0-255, so it makes sense to grab oneByteTokens[ord(byte_of_tokenized_data)] to get the one-byte token string corresponding to the value (0-255) of the given character/byte of the tokenized data. Does that make sense? You only need it because the bytes/characters of the tokenized data are usually represented as characters rather than numeric values.
[UPDATE #11 - 16/01/14] - New Year, New Goals
So, to mark the occasion, this is the first development log of 2014. Exciting, huh? The next major step in the development of WAti is to get all of the code in a remote repository because it is becoming quite hard to develop without some degree of organisation. In other words, I need to be able to code in a headache free manner.

To do this, I have adopted the use of BitBucket and the version control system hg (or mercurial) and like so many steps in this project, I appear to have reached another stumbling block.

Using tortoisehg, I have attempted to clone my newly created repo so I can chuck my semi-working code into it and begin to edit, commit and such. This is all great, but when tortoisehg runs:

Code:
hg clone https://ElectronicsGeek@bitbucket.org/ElectronicsGeek/wati

tortoisehg returns:

Code:
HTTP Error: 404 (Not Found)

Any ideas?

I definitely feel like a VCS will enable me to be more productive and gain more momentum with this project.
What if you try it without the username@ part? When I go to that URL (without the username) in a browser, I get "You do not have access to this repository", which presumably at least means that that's the correct URL for it. Luckily, the first Google result for "bitbucket clone 404" gives the answer:

https://bitbucket.org/tortoisehg/hgtk/issue/1874/cant-clone-private-repos-with-tortoisehg
After a chat on bitbucket support, I found out I accidentally configured my repository as git rather than hg. Now that's sorted out, I'm just cloning all the files in.
BUMP!
Currently taking the form of a Windows 8 sticky note on my desktop is the WAti development checklist. Here's where we are:

Code:
Programming:
"Talk Less Do More!"
WAti:
   [Y] Sort BB Repo Out
   [] Get Detokenizer Working #Current task. Making OK progress.
   [] Make Tokenizer
   [] Get I/O with server working
   [] Make Client Work
   [] Final Debug
   [] Distribution!
All Easier Said Than Done!
Thanks for posting this on my prompting. I'm really tempted to ask you to add me as a committer for the repository, but as you pointed out on IRC, there's a serious danger of me just finishing up the project, and I don't want to steal your project. Wink Best of luck!
[UPDATE #12 - 4/2/13] - The Detokenizer Is Complete
After months of fumbling around with XML parsing and detokenization code, I am very pleased to announce that WAti now has a working detokenizer in the form of a module in python I have written, named 'titokens.'

Here is the result:


I had an epiphany on Monday night and somehow, everything just clicked. I definitely feel I have gained some significant momentum and certainly some more, much needed motivation.

It also means another item has been completed on the checklist, which now looks like:

Code:
WAti:
   [Y] Sort BB Repo Out
   [Y] Get Detokenizer Working
   [] Make Tokenizer
   [] Get I/O with server working
   [] Make Client Work
   [] Final Debug
   [] Distribution!


When 'titokens' has a tokenization method I am considering distributing it, which will make Global CALCnet programs that little bit easier to make, hopefully resulting in an abundance of new services that utilise gCn.
Great job, keep up the good work. I'm glad to hear that the detokenization idea finally clicked for you, and please let me/us know how we can help push you towards getting the tokenization to work. Remember, just greedily chomp as much as possible, see if it's a valid token, then gradually remove letters from the "hypothesis" until it matches a known token.

"Pause X+3" -> Try "Pause X+3", not a token
"Pause X+3" -> Try "Pause X+", not a token
"Pause X+3" -> Try "Pause X", not a token
"Pause X+3" -> Try "Pause ", token!
"X+3" -> Try "X+3", not a token
"X+3" -> Try "X+", not a token
"X+3" -> Try "X", token!
"+3" -> Try "+3", not a token
"+3" -> Try "+", token!
"3" -> Try "3", token!
Right. If I can, I'll have a look into that tomorrow. I gather that tokenization will be more of a brute force based algorithm from the information you have given me. Thanks a lot! Smile
Are there any plans for image support?
Not as yet. I need to get core functionalities working first. Furthermore, it would be a very hard task to pull off, especially in BASIC.
ElectronicsGeek wrote:
Not as yet. I need to get core functionalities working first. Furthermore, it would be a very hard task to pull off, especially in BASIC.
Perhaps not as hard as you would think: your python program would resize, and threshold images and send them as a series of frames, 8 pixels per byte, and your receiving program would draw them out on the screen.
[UPDATE #13 - 4/3/13] - Tokenizer Is Starting To Take Shape
More Progress:

This is the foundation of a working tokenizer. It detects whether characters are in the 'tokens' dictionary. All it needs to do now is handle double byte tokens and actually bung the tokens into a string of bytes that the calculator can read, ready to be sent over gCn.

Code:
WAti: 
   [Y] Sort BB Repo Out 
   [Y] Get Detokenizer Working 
   [25%] Make Tokenizer
   [] titokens OOP overhaul
   [] Get I/O with server working 
   [] Make Client Work 
   [] Final Debug 
   [] Distribution!


Haha, just realised that this was submitted one month after the detokenizer was completed Smile

EDIT: I can't count Very Happy
Keep up the good work! What's holding you up with the tokenizer at the moment? Anything I/we can do to help?
Life Razz

Nah, just kidding. The next step is pushing the tokens into a string. How should I go about doing that? I think the tokens are stored as ints in the dictionary...

Could I '\x' the byte definitions into a string, or should I use something more elaborate like a 'bytearray' datatype?
ElectronicsGeek wrote:
Life Razz

Nah, just kidding. The next step is pushing the tokens into a string. How should I go about doing that? I think the tokens are stored as ints in the dictionary...

Could I '\x' the byte definitions into a string, or should I use something more elaborate like a 'bytearray' datatype?
It ends up being pretty much the same thing. Your options include '\x1A', chr(0x1A), and bytearray([0x1A]).
A bytearray is essentially a string with the advantage of you getting to manage the encoding manually. It is hands-down the best option for something like this.
Awesome. I'll endeavour to make some progress on that front this weekend. Since my byte definitions are encoded as ints, I suspect all I'll need to do is:

Code:
bytearray(hex(byte_definition_from_tokens_dict)

Cheers Smile
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
» Goto page Previous  1, 2, 3, 4, 5, 6  Next
» View previous topic :: View next topic  
Page 4 of 6
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement