Hey there everyone. Here's the deal on my summer plans at this point. I'd like to make significant progress on FreeBuild - in particular finish my reworking of the 3Space rendering pipeline to use VBOs, and tying the (95% complete) LDraw parsing capabilities into that system. I may also try to implement batching of some kind, but that isn't nearly as mission critical. My other FreeBuild related goal for the summer is to strip out the physics/collision code and replace it with the Bullet collision library.

For side projects, I'd like to polish up my Python card game library and finish my implementation of Sabacc with networked multiplayer. The game engine has already been abstracted into a client/server model with general purpose Player objects that are responsible for polling a user input of their choice/implementation.

If Kerm gets the runprog chaining working in a reasonable manner, I plan to finish up my Doors-based thumbdrive browser + program launcher.

Finally, this weekend I put about 16 hours into developing a new compiled programming language for z80 graphing calculators (developed on the computer, with a Python based compiler). I hammered out the syntax, which resembles C with some Pythonisms and z80 specific quirks thrown in and few changes in the operators so that parsing a source file requires less contextual information. I'll post more information on this when I have a workable demo, but I've implemented the symbol lookup routines, and my parsing routines are happy generating the ASTs (Abstract Syntax Tree) for about 75% of my language features. Also, at least for the moment, I'm calling it Stickle.
Good projects, you have there!

elfprince13 wrote:

Finally, this weekend I put about 16 hours into developing a new compiled programming language for z80 graphing calculators (developed on the computer, with a Python based compiler). I hammered out the syntax, which resembles C with some Pythonisms and z80 specific quirks thrown in and few changes in the operators so that parsing a source file requires less contextual information. I'll post more information on this when I have a workable demo, but I've implemented the symbol lookup routines, and my parsing routines are happy generating the ASTs (Abstract Syntax Tree) for about 75% of my language features. Also, at least for the moment, I'm calling it Stickle.

Nice, I would love to see its capabilities, soon. Smile


For my summer, after exams, I am thinking in finishing some games in assembly for z80. Wink
Elfprince, I especially look forward to the Freebuild progress, especially LDraw-based things, and having more peeps to build with. Hopefully we can draw up some train plans as well. Smile
elfprince13 wrote:
Finally, this weekend I put about 16 hours into developing a new compiled programming language for z80 graphing calculators (developed on the computer, with a Python based compiler). I hammered out the syntax, which resembles C with some Pythonisms and z80 specific quirks thrown in and few changes in the operators so that parsing a source file requires less contextual information. I'll post more information on this when I have a workable demo, but I've implemented the symbol lookup routines, and my parsing routines are happy generating the ASTs (Abstract Syntax Tree) for about 75% of my language features. Also, at least for the moment, I'm calling it Stickle.


What are you using for the lexer and parser?

And you should open source it and host on google code or similar, I just might help with this one Wink

EDIT: Is this why you were asking me those regex questions? If so, you fool! You should use PLY: http://www.dabeaz.com/ply/
Kllrnohj wrote:
What are you using for the lexer and parser?

Kllrnohj wrote:
EDIT: Is this why you were asking me those regex questions? If so, you fool! You should use PLY: http://www.dabeaz.com/ply/

That's boring. I'm having fun implementing shunting-yard, constant folding, tokenizers (okay, so I'm cheating on this one by using re, but I'm probably going to end up implementing Thomson's NFA algorithm for $hits and giggles)

Kllrnohj wrote:
And you should open source it and host on google code or similar, I just might help with this one Wink


This is pretty likely.
Argh, you're using regex for tokenization/parsing instead of an FST-based parser and lexer? Sad That makes me sad.
KermMartian wrote:
Argh, you're using regex for tokenization/parsing instead of an FST-based parser and lexer? Sad That makes me sad.



No, just for tokenization at this point. My current setup should be pretty much equivalent to using an FST though, because I have the regular expression representing the FSM's accepted input, and a dictionary mapping input to output.

I'm using Djikstra's shunting-yard algorithm + some additional variables for state information to parse the token stream into a syntax tree.
Grant Approves. You should also come visit me this summer. Little Rock has /slightly/ more to do than Burlington. Do you estimate that your compiled language will be on par or faster than Axe or BBC Basic?
Pseudoprogrammer wrote:
Grant Approves. You should also come visit me this summer. Little Rock has /slightly/ more to do than Burlington. Do you estimate that your compiled language will be on par or faster than Axe or BBC Basic?

Unfortunately my income is pretty minimal, meaning my travel budget is also pretty minimal. That would be teh 1337sauce though.

Also, I have no idea how it'll do performance wise, but I would think as a compiled language it'll do better than BBC Basic.
*Bump* - I just found out about this, and that's pretty awesome Smile ... Is this still something you are going to finish? It sounds similar to my project (which I first posted on Cemetech about a month after your last post here), so perhaps we could discuss similarities/differences? You clearly know your compiler material though (although the re does make me cringe with Kerm).
shkaboinka wrote:
*Bump* - I just found out about this, and that's pretty awesome Smile ... Is this still something you are going to finish? It sounds similar to my project (which I first posted on Cemetech about a month after your last post here), so perhaps we could discuss similarities/differences? You clearly know your compiler material though (although the re does make me cringe with Kerm).


I haven't worked on this since, to be quite honest. If I picked up work on it now I'd probably implement it differently, since I've played around with lexing/parsing a bit more than I had at the time, and have access to a Python-friendly yacc implementation and have figured out some of the idiosyncrasies of Plex.
elfprince13 wrote:
shkaboinka wrote:
*Bump* - I just found out about this, and that's pretty awesome Smile ... Is this still something you are going to finish? It sounds similar to my project (which I first posted on Cemetech about a month after your last post here), so perhaps we could discuss similarities/differences? You clearly know your compiler material though (although the re does make me cringe with Kerm).

I haven't worked on this since, to be quite honest. If I picked up work on it now I'd probably implement it differently, since I've played around with lexing/parsing a bit more than I had at the time, and have access to a Python-friendly yacc implementation and have figured out some of the idiosyncrasies of Plex.

Well I certainly wouldn't urge you to abandon it; but since I've "restarted" OPIA a few times (including scrapping a ton of mostly set code), maybe some discussion would either spark another shot for you or at least help me to grab some helpful aspects.

One of my major goals with OPIA is to select JUST the right features to give the language power that just cannot be replaced well with other features (e.g. function-pointers, coroutines, closures, polymorphism of objects, etc.). The best way I have found to do this is to analyze other languages to see what one offers that is simply lacking in others. For example, Java does not have first-class functions (ARG!), and few languages have coroutines. Some of the languages that I found very influential for this reason were:
C/C++/Java - The main "Big" languages that I based it off of in the first place. If I could "do what they do" in the most efficient way for the z80...
C# - Designed based on lessons learned from Java and C++; VERY clean setup for OOP, but also open to low-level stuff (e.g. still gives control of when to use references or not).
Lua - The idea of the coroutine (but I thought Lua was too wordy about it).
JavaScript - The idea of closures and first-class functions.
Go - Google's Go is to C what C# is to C++/Java. Makes you not need to have a big OOP language to "make it easy". OPIA is now based highly off of this.
The Proton language proposal - An excellent example of somebody else trying to do the same thing. I almost took "mixins" from this, but then switched to a Go style which separates polymorphic behavior from type definition (e.g. structs are separated from methods and interfaces).

One thing that I often get a lot of heat for is when I say that I do not use traditional compiler tools: NO lexer, NO compiler-compiler, no regex, etc. ... I feel that these tools are generally the foundation of any decent compiler; but that if you understand the reason for them (and if you have a SOLID understanding of dynamic programming and asymptotic performance), then you can design it better by hand. For example, even if you use regex to parse tokens, you still have to read in each character and correctly form identifiers, numbers, string-literals, operators, comments, etc., so I do that directly. I have all the predefined tokens defined as final instances, which get stored in a binary tree at creation; and then it's a matter of a binary search. Identifiers get added to a tree as they appear in such a way that ANY instance of the same identifier WILL result in the very same instance of a Token object. By storing tokens as objects rather than ints (which is fine because object references take up as much space), each token has built-in functionality. There are functions/methods that instantly tell if the token is of a certain type or fits certain other queries (e.g. can it be a "starting" token of a statement or global declaration) by looking directly at a bit-field rather than saying "return this == a || == b || == c" which takes COMPUTATION each time. ...My token-getter simply grabs the next token from an input, and other parts of the compiler chain directly to it (meaning that you don't have to create a huge list for all the tokens and then go through the list again); however, each aspect of the compiler is completely modular from the others (which is why my compiler will also be an API to it's own aspects: tokenizing, building syntax trees, etc. can all be used directly).
shkaboinka wrote:

Well I certainly wouldn't urge you to abandon it; but since I've "restarted" OPIA a few times (including scrapping a ton of mostly set code), maybe some discussion would either spark another shot for you or at least help me to grab some helpful aspects.
...
If I could "do what they do" in the most efficient way for the z80...

One thing I had sketched out was making sure I had operators to correspond to most of the arithmetic/logical operations available in z80, with higher specificity than allowed for by C and most of its derivatives (for example bitwise rotation). There's some other stuff for which I'm no longer quite sure what I had in mind, because my notes were pretty terribly organized. First class functions were definitely a must, but I don't think I had any intention of object-orientation. More like C-style programming. I'm sure if I did it now I'd do some things differently.


Quote:
One thing that I often get a lot of heat for is when I say that I do not use traditional compiler tools: NO lexer, NO compiler-compiler, no regex, etc. ... I feel that these tools are generally the foundation of any decent compiler; but that if you understand the reason for them (and if you have a SOLID understanding of dynamic programming and asymptotic performance), then you can design it better by hand. For example, even if you use regex to parse tokens, you still have to read in each character and correctly form identifiers, numbers, string-literals, operators, comments, etc., so I do that directly.

Nice in theory, but only really a good idea if learning about those subjects is secondary to getting your language to work. If I pick this project up again, I almost certainly wouldn't try to implementing the Shunting Yard algorithm by hand, because there are existing parsing tools to do the job I want done, and my time is valuable. The further you get in your education/research/career, the more this last point will become relevant.

Quote:
There are functions/methods that instantly tell if the token is of a certain type or fits certain other queries (e.g. can it be a "starting" token of a statement or global declaration) by looking directly at a bit-field rather than saying "return this == a || == b || == c" which takes COMPUTATION each time

Tagged values are nice Smile

Quote:
My token-getter simply grabs the next token from an input, and other parts of the compiler chain directly to it (meaning that you don't have to create a huge list for all the tokens and then go through the list again); however, each aspect of the compiler is completely modular from the others (which is why my compiler will also be an API to it's own aspects: tokenizing, building syntax trees, etc. can all be used directly).

Why not write front + back ends for LLVM?
elfprince13 wrote:
shkaboinka wrote:
One thing that I often get a lot of heat for is when I say that I do not use traditional compiler tools: NO lexer, NO compiler-compiler, no regex, etc. ... I feel that these tools are generally the foundation of any decent compiler; but that if you understand the reason for them (and if you have a SOLID understanding of dynamic programming and asymptotic performance), then you can design it better by hand. For example, even if you use regex to parse tokens, you still have to read in each character and correctly form identifiers, numbers, string-literals, operators, comments, etc., so I do that directly.

Nice in theory, but only really a good idea if learning about those subjects is secondary to getting your language to work. If I pick this project up again, I almost certainly wouldn't try to implementing the Shunting Yard algorithm by hand, because there are existing parsing tools to do the job I want done, and my time is valuable. The further you get in your education/research/career, the more this last point will become relevant.

That is only the case if it's about completing a task (albeit effectively). The "correct way" to write a compiler is by hand, and it's entirely practical for a quality product (see HERE by an expert). But I agree that, when the priority is to just generate a working compiler (and that's most cases) then it is entirely impractical to code it by hand when there are well proven "compiler tools" out there. But the widespread misdirection that those tools are the correct (or even best) way to do it is very upsetting to me as a computer scientist.

I discovered that Java's linked lists append by converting one to an array first. Since that totally violated my reasons for using them, "doing it right" meant whipping together a lightweight linked-list class that served my purposes. That ought to be normal practice (not dormant knowledge) for anyone wanting to call themselves a computer scientist; and something like that ought to be a trivial case. When I designed my own game-state-manager for my Game Design class (and yes, I looked at other designs first), it was a bit of a challenge, but entirely practical. Using transformation matrices directly instead of the bulky mediocre classes in the book (which made things more complicated than they needed to be) was entirely practical. Making my own 2-year planners that fit nicely in my pocket (and look snazzy) has been entirely practical. ... Now, the vast majority of computer science graduates do not need to be more than mildly familiar with compilers; but computer science is about being able to design a framework or algorithm (albeit with intelligent evaluation of what's out there and how others do it) for whatever the situation may be -- even (and especially) brand new ones! A computer scientist is supposed to be an expert at this! ...Unfortunately, Computer Science is being molested by the job market: the education system is now designed to churn out "programmers" rather than innovators...

To be fair though, I've endeavored to become an expert on compiler design; but since this project is to be my Magnum Opus (since 2004), doing the research and understanding the theory of it was the only right way to do it, since I am going to do it right rather than just make it work. This is MY work! I feel that as a computer scientist, I can should respond to my research as my own expert, and in this case I decided that I can code it much better than it would be if I used other tools (I have seen the bulky nastiness of generated-compilers...); and as a computer scientist (having adequate knowledge on a specific topic, and everyone has their own), doing so is a trivial matter (though I might mean "trivial" in the sense that building a house would be trivial for a very experienced contractor; it will still take a while).

elfprince13 wrote:
shkaboinka wrote:
...each aspect of the compiler is completely modular from the others (which is why my compiler will also be an API to it's own aspects: tokenizing, building syntax trees, etc. can all be used directly).

Why not write front + back ends for LLVM?

...Ok, I can get behind something like LLVM. Intermediate code and SSA form is pretty dang standard and universal to basically any language on any machine. Since I'd be converting my code to SSA and something at least akin to "three address code", I will consider it -- though there are some z80 specific design aspects, and there is a lot that I want to make sure happens with data-flow analysis and some things being precomputed...

Anyway, what I was actually referring to about modularity is that, being packaged in a java JAR file means that all the classes will be accessible. For this reason, I am leaving public static methods in them so that one can directly reach in and say "tokenize this" or "parse this expression" or "add a node directly to this syntax tree". Being an executable JAR, the main code will take command-line input and pipe these parts together with very little effort (I chose Java so that it could run anywhere, and also so that it's automatically an API by how I code it). The benefit is that an editor, project manager, or any other sort of tool that could be created remains separate, but the core compiler will be all bundled up and keep the language uniform (just like how Java works).
...Sorry for that long response; you opened up a very passionate topic/stance, and I had to let it all out ... Ignore previous edits, as I made a LOT of revisions Razz
Just found my notes on the language:

Quote:
Comments
//
/* */

Preprocessor

``COMMAND operands
examples:
``INCLUDE
``DEFINE
also, conditionals


types
void, byte, short, array/pointer, struct/typedef

struct[optional alignment/padding factor]{ fields; } name;
typedef old_type new_name;

Operators

=

+
-
*
/
%

++
--

[] index
[:] range

<
>
<=
>=
==
!=

?:
<=>

^
&
|
~

^^
&&
||
!

.
->

SHIFT/ROTATE

@> Rotate Right
<@ Rotate Left

>> Arithmetic Left Shift
<< Arithmetic Right Shift

>>> Logical Right Shift
<<< SLL - (a <<< n) == (a << n + (1 << n)-1)

Number/Literal Notation
0xnnnn $nnnn nnnnh (hex)
%nnnnn nnnnnb (bin)
nnnnd nnnnn (dec)
nnnnq or nnnno (octal)
'c' or '\\' or '"' or '\'' or '\nn' (character constants)


code locatability

!@ address
@!

Functions/Function References
def (inline) returntype fcnname(type arg1, type arg2, type arg3, type etc){

}

def (inline) returntype :(type arg1, type arg2, etc){} (anonymous function)

returntype#:(type arg1, type arg2, etc) (function pointer)




Register conventions??

Operator Definition/Overloading - always inline
opdef opsymbol(arg1, arg2, optionally arg3 for ranges and ?Smile

dereferencing/addressing
@ dereference
# address

scoping
{}

control

switch/case/default
break/continue
each (takes a range)
for
while
do
if/elif/else
label
goto (scope-limited)

inline assembly
${

}$

Separators, statement terminators
, ;

Array Definitions
[[ ]]
r"characters go here" (Raw character array)
"characters go here" (null-terminated C string)
p"characters go here" (Pascal style string - max 255 bytes)
P"characters go here" (Pascal style string - max 65536 bytes)


PROGRAM STRUCTURE

/* Comments */

// global definitions and declarations

//Modules - programs must have one module at progstart
// - applications must have one module at appstart
// - applications must be file separated by page
//

!@ (modulename,address,entrypoint)



@!




shkaboinka wrote:
That is only the case if it's about completing a task (albeit effectively). The "correct way" to write a compiler is by hand, and it's entirely practical for a quality product ... entirely impractical to code it by hand when there are well proven "compiler tools" out there.

This depends entirely on what you mean "by hand". Even if you aren't going the prepackaged route, if you aren't at least partially creating your own toolset as you go, you're going to be in for a lot of pain when you need to change the grammar or tokenizing rules for your language and you have to make the changes in code rather than in a specification file somewhere.

Quote:
I discovered that Java's linked lists append by converting one to an array first. Since that totally violated my reasons for using them, "doing it right" meant whipping together a lightweight linked-list class that served my purposes. That ought to be normal practice (not dormant knowledge) for anyone wanting to call themselves a computer scientist; and something like that ought to be a trivial case.

Let's be fair here: first year d&a are a little bit different from compiler design, and there other areas of research within computer science. I wouldn't expect a computational geometer to be "up" on parsing algorithms any more than I would expect someone with an interest in compiler optimization techniques to be "up" on geometric group theory. Also, I'm pretty sure that you were doing something wrong if Java was converting your lists to arrays as an intermediate step - I'd be curious to see the code you thought was doing that.

Quote:
I feel that as a computer scientist, I can should respond to my research as my own expert, and in this case I decided that I can code it much better than it would be if I used other tools (I have seen the bulky nastiness of generated-compilers...); and as a computer scientist (having adequate knowledge on a specific topic, and everyone has their own), doing so is a trivial matter (though I might mean "trivial" in the sense that building a house would be trivial for a very experienced contractor; it will still take a while).

This is fair of course, but a lot of generated compilers have bulky insides because they're designed to be fast, and there's that whole pesky Time/Memory tradeoff thing at work - and if they're slow, it's often because the person designing the grammar didn't know which forms of recursion were performant for the particular parsing algorithm chosen.
elfprince13 wrote:
r"characters go here" (Raw character array)
"characters go here" (null-terminated C string)
p"characters go here" (Pascal style string - max 255 bytes)
P"characters go here" (Pascal style string - max 65536 bytes)

That is an AWESOME idea! ... I may have to steal it Smile

elfprince13 wrote:
Even if you aren't going the prepackaged route, if you aren't at least partially creating your own toolset as you go, you're going to be in for a lot of pain when you need to change the grammar or tokenizing rules for your language and you have to make the changes in code rather than in a specification file somewhere.

Very true, and duly noted. ...I've made that mistake before, and it sucks! ...So I realize the risk, though this time I feel like I REALLY know what I'm in for, and am making sure to hammer down the language very tightly before I do anything big. Even then, I try to keep it very modular so that changes are not too difficult. Once I get it most of the way done, I am sticking with the grammar though (you have to at some point).

elfprince13 wrote:
Let's be fair here: first year d&a are a little bit different from compiler design, and there other areas of research within computer science. I wouldn't expect a computational geometer to be "up" on parsing algorithms any more than I would expect someone with an interest in compiler optimization techniques to be "up" on geometric group theory. Also, I'm pretty sure that you were doing something wrong if Java was converting your lists to arrays as an intermediate step - I'd be curious to see the code you thought was doing that.

Agreed that it's a big difference per se. I was thinking of compiler design for somebody writing a compiler as compared to game design for somebody writing a game. My point was that, whatever the focus is, I feel that a computer scientist should be able to do it themselves (and make an intelligent decision of whether that is necessary / worth it). ... As for Java's Linked Lists, look at the java.util.LinkedList.addAll method: it takes an abstract collection and converts it to an array without even testing to see if it's a list! (there is no other method provided for combining linked lists).

elfprince13 wrote:
A lot of generated compilers have bulky insides because they're designed to be fast, and there's that whole pesky Time/Memory tradeoff thing at work - and if they're slow, it's often because the person designing the grammar didn't know which forms of recursion were performant for the particular parsing algorithm chosen.

I admit that I might not have grasped what mechanisms where in place the first time I looked at it; but now that I know how parse tables etc. work, I can say that they truly are well designed, dynamic, and FAST! ....However, I can make code that is at least as fast by doing the same task without going through the extra framework, though it requires hard-coding a lot of the logic in. The standard compiler framework, as I said, is pretty darn efficient, fast, and all around solid; but it's not all about efficiency and speed: It's also about being able to code the grammar separately so it's very easy to modify or rewrite. I feel like I can write a decent compiler without having to go through the extra layer (the "bulk" is all the extra mechanisms just so it can work from a grammar, rather than the compiler being coded just for one grammar), and do so well.

Anyway, I certainly don't have anything bad to say about compiler tools -- they are great! ... I just get tired of people thinking that if you do it all yourself that it automatically means that either you don't know what you are doing, or that you are making it more difficult/bulky/etc. than it needs to be. When I start actually doing more with the code, I am going to document and compare the pipeline/flow of what books say a compiler is versus what I am going to do (which follows the books anyway, but cuts corners because the grammar is a given rather than another layer).
shkaboinka wrote:
elfprince13 wrote:
r"characters go here" (Raw character array)
"characters go here" (null-terminated C string)
p"characters go here" (Pascal style string - max 255 bytes)
P"characters go here" (Pascal style string - max 65536 bytes)

That is an AWESOME idea! ... I may have to steal it Smile

In most programming languages it would be an awful idea....but since z80 routines are all over the place as to which convention they use, I figured it best to support all of them, out of the box.


Quote:
As for Java's Linked Lists, look at the java.util.LinkedList.addAll method: it takes an abstract collection and converts it to an array without even testing to see if it's a list! (there is no other method provided for combining linked lists).

The array conversion seems to vary by implementation (i.e. GNU Classpath vs OpenJDK, etc), but the key thing is that it does appear to be consistently an O(n) operation, since the addAll method copies elements from arbitrary Collections. This is probably actually a good thing in most cases because you don't want to have confusion between which operations copy data and which share data. Single-adding still works in O(1) time.
elfprince13 wrote:
The array conversion seems to vary by implementation (i.e. GNU Classpath vs OpenJDK, etc), but the key thing is that it does appear to be consistently an O(n) operation, since the addAll method copies elements from arbitrary Collections. This is probably actually a good thing in most cases because you don't want to have confusion between which operations copy data and which share data. Single-adding still works in O(1) time.

::turns red:: Of course it copies (I totally forgot that in my post) ... I guess a better statement would be that I made a lightweight linked-list that consumes lists on append, since I am just using it specifically to chain multiple pieces together (and speed was important) ... but yeah, any good collections API ought to just copy on append (though a "consume" method would be nice for a linked list or tree) :/
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement