KermMartian wrote:
Are you asking me what the tokens for those four mathematical operators are?


Code:
tAdd            EQU       70h         ; 70h '+'
tSub            EQU       71h         ; 71h '-'
tMul            EQU       82h         ; 82h  '*'
tDiv            EQU       83h         ; 83h  '/'


:O Thanks much
No problem. Smile After the years I've spent working on SourceCoder, I practically know these from memory at this point. Smile
KermMartian wrote:
No problem. Smile After the years I've spent working on SourceCoder, I practically know these from memory at this point. Smile


Hahahah, DrPython just stopped working, so I'm using stupid Eclipse :s


Code:

import re

buf = bytearray()


text = "3X+2"
token = re.split(r"(\W+)", text)
tokens = []

for i in token:
    if (i == " " and token != []):
        tokens[len(tokens)-1] += i
    else:
        tokens.append(i)

finaltext=""

tdisp = "Disp "
tprompt = "Prompt "
tX = "X"
tAdd = "+"
t2 = "2"
t3 = "3"

for i in tokens:
    if (i == tdisp):
        finaltext = finaltext+"$DE"
        buf.append(0xde)
    if (i == tprompt):
        finaltext = finaltext+"$DD"
        buf.append(0xdd)
    if (i == tX):
        finaltext = finaltext+"$58"
        buf.append(0x58)
    if (i == t3):
        finaltext = finaltext+"$33"
        buf.append(0x33)
    if (i == tAdd):
        finaltext = finaltext+"$70"
        buf.append(0x71)
    if (i == t2):
        finaltext = finaltext+"$32"
        buf.append(0x32)

print finaltext


raw_input()


This is returning:


Code:
$70$32


So, it is not recognizing the X and the +, probably because they are symbols in strings?
You have a typo (0x71 vs $70), but besides that you're splitting on uppercase letters now? I don't that that is going to work terribly well.
KermMartian wrote:
You have a typo (0x71 vs $70), but besides that you're splitting on uppercase letters now? I don't that that is going to work terribly well.


Thanks for the typo, first of all.

For the program, gotta check it later
ScoutDavid wrote:
KermMartian wrote:
You have a typo (0x71 vs $70), but besides that you're splitting on uppercase letters now? I don't that that is going to work terribly well.


Thanks for the typo, first of all.

For the program, gotta check it later
OK. Just to make sure we're on the same page, can you explain how you're trying to split the source code now?

Code:

token = re.split(r"(\W+)", text)
tokens = []

for i in token:
 if (i == " " and token != []):
  tokens[len(tokens)-1] += i
 else:
  tokens.append(i)


It is adding spaces to all tokens found, but I gotta make a new system.
But it looks like it's splitting on non-word character, which I don't think is going to work at all. You really should revisit grabbing off maximal subchunks that match existing tokens rather than trying to split on something consistent, in my personal view.
KermMartian wrote:
But it looks like it's splitting on non-word character, which I don't think is going to work at all. You really should revisit grabbing off maximal subchunks that match existing tokens rather than trying to split on something consistent, in my personal view.


I've got two choices:

matching and searching, but I can't seem to understand matching, so I'll probably go for searching.

SourceCoder uses matching, right?
ScoutDavid wrote:
KermMartian wrote:
But it looks like it's splitting on non-word character, which I don't think is going to work at all. You really should revisit grabbing off maximal subchunks that match existing tokens rather than trying to split on something consistent, in my personal view.


I've got two choices:

matching and searching, but I can't seem to understand matching, so I'll probably go for searching.

SourceCoder uses matching, right?
SourceCoder does indeed use maximal substring matching, hence why I'm suggesting it, since I know it works. Smile
'maximal substring matching', what is that?
ScoutDavid wrote:
'maximal substring matching', what is that?
Say you have a string like "Xmax*3", and tokens including "X", "m", "a", "x", "Xmax", "*", and "3". A maximal substring matcher would see that "Xmax*3" could be either "Xmax" "*" "3" or "X" "m" "a" "x" "*" "3", and choose the former,matching the maximal substring "Xmax" instead of "X" for the first token.
KermMartian wrote:
ScoutDavid wrote:
'maximal substring matching', what is that?
Say you have a string like "Xmax*3", and tokens including "X", "m", "a", "x", "Xmax", "*", and "3". A maximal substring matcher would see that "Xmax*3" could be either "Xmax" "*" "3" or "X" "m" "a" "x" "*" "3", and choose the former,matching the maximal substring "Xmax" instead of "X" for the first token.


That's perfect.

Gonna go to python forums and ask for help on that
ScoutDavid wrote:
KermMartian wrote:
ScoutDavid wrote:
'maximal substring matching', what is that?
Say you have a string like "Xmax*3", and tokens including "X", "m", "a", "x", "Xmax", "*", and "3". A maximal substring matcher would see that "Xmax*3" could be either "Xmax" "*" "3" or "X" "m" "a" "x" "*" "3", and choose the former,matching the maximal substring "Xmax" instead of "X" for the first token.


That's perfect.

Gonna go to python forums and ask for help on that
I'd be happy to give you code or pseudocode to tell you how to do it, but I wanted to let you figure it out on your own. Smile
If I had the PHP code I could easily do that Razz

Now, I found some Search VS Match articles that may help me Smile
ScoutDavid wrote:
If I had the PHP code I could easily do that Razz

Now, I found some Search VS Match articles that may help me Smile
Here's some simplified code to hopefully get you started:


Code:
function token2bin($in)
{
    while($i<strlen($in)) {
        $thisout='';
        $thisout1=-1;
        $thisout2=-1;
        for($l=$maxtokenlength;$l>0;$l--) {
            $k = substr($in,$i,$l);
            if ((isset($tr[$k])) && ($thisout == '')) {
                $thisout = $k;
                $j = $tr[$k];
                if ($j<999){
                    $thisout1= $j;
                    $thisout2= -1;
                } else {
                    $thisout1 = intval($j/1000);
                    $thisout2 = $j-1000*$thisout1;
                }
            }
        }
        if($thisout1 == -1) {
            //no matching token
            //so yell at the user
        } else {
            //we have a maximal match
            $i += strlen($thisout);
            if($thisout2==-1) {
                $out .= chr($thisout1);
            } else {
                $out .= chr($thisout1) . chr($thisout2);
            }
        }
    }
    return $out;
}
KermMartian wrote:
ScoutDavid wrote:
If I had the PHP code I could easily do that Razz

Now, I found some Search VS Match articles that may help me Smile
Here's some simplified code to hopefully get you started:


Code:
function token2bin($in)
{
    while($i<strlen($in)) {
        $thisout='';
        $thisout1=-1;
        $thisout2=-1;
        for($l=$maxtokenlength;$l>0;$l--) {
            $k = substr($in,$i,$l);
            if ((isset($tr[$k])) && ($thisout == '')) {
                $thisout = $k;
                $j = $tr[$k];
                if ($j<999){
                    $thisout1= $j;
                    $thisout2= -1;
                } else {
                    $thisout1 = intval($j/1000);
                    $thisout2 = $j-1000*$thisout1;
                }
            }
        }
        if($thisout1 == -1) {
            //no matching token
            //so yell at the user
        } else {
            //we have a maximal match
            $i += strlen($thisout);
            if($thisout2==-1) {
                $out .= chr($thisout1);
            } else {
                $out .= chr($thisout1) . chr($thisout2);
            }
        }
    }
    return $out;
}


Thanks, gotta study that when I can Smile
Sounds good; let me know if you need any clarification on how it works, since it's more or less devoid of comments.
KermMartian wrote:
Sounds good; let me know if you need any clarification on how it works, since it's more or less devoid of comments.


Some comments which is good, and the name of the variables are helpful too Smile
ScoutDavid wrote:
KermMartian wrote:
Sounds good; let me know if you need any clarification on how it works, since it's more or less devoid of comments.


Some comments which is good, and the name of the variables are helpful too Smile


$in and $out are respectively the plaintext input and the tokenized output. $maxtokenlength is 16, since no token is longer than 16 plaintext characters.


Outermost while() loop; loops each time a new section of plaintext is removed as a token:

Code:
function token2bin($in)
{
    while($i<strlen($in)) {
        $thisout='';
        $thisout1=-1;
        $thisout2=-1;


$thisout1 is always set to positive. $thisout2 is only set positive for two-byte tokens. Next, the for() loop that performs the actual maximal substring matching:


Code:
        for($l=$maxtokenlength;$l>0;$l--) {
            $k = substr($in,$i,$l);
            if ((isset($tr[$k])) && ($thisout == '')) {
                $thisout = $k;
                $j = $tr[$k];


Inside that, code to match one-byte (first section) and two-byte (second second) tokens:

Code:
                if ($j<999){
                    $thisout1= $j;
                    $thisout2= -1;
                } else {
                    $thisout1 = intval($j/1000);
                    $thisout2 = $j-1000*$thisout1;
                }
            }
        }


Finally, evaluate the results of the MSM:

Code:
        if($thisout1 == -1) {
            //no matching token
            //so yell at the user
        } else {
            //we have a maximal match
            $i += strlen($thisout);
            if($thisout2==-1) {
                $out .= chr($thisout1);
            } else {
                $out .= chr($thisout1) . chr($thisout2);
            }
        }
    }
    return $out;
}
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
» Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
» View previous topic :: View next topic  
Page 3 of 8
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement