| New Post | New Topic |
| Goto page Previous 1, 2, 3 ... 12, 13, 14 ... 18, 19, 20 Next | |
| 11 Apr 2012 11:47:31 am by shkaboinka | Quote | |
| Some final notes before I begin in (May or June?). THESE ARE DECIDED (tentatively):
The new numeric datatypes are byte, sbyte, int, and uint. String literals can take on different forms: ::: "null terminated string" ::: b"byte-prefixed string" ::: i"int-prefixed string" ::: r"raw string" Numeric literals are of type "int" by default. Type-casts can be used to be explicit: (byte)5; Hexadecimal and Binary literals are prefixes with 0x and 0b, respectively, and default to type "byte" or "uint" types according to the number of digits used (so as to reflect a literal bit representation). Dereferencing is now on the right-side: ::: *[]byte pa; (*pa)[idx]; ::: []*byte ap; *(ap[idx]); // or just *ap[idx] Constant (immutable) variables are defined with "const" in addition to the datatype. Constant expressions (pasted in like macros) are defined with "const" alone. It is illegal to mix "const" or "volatile" with the ":=" operator. There are standard-, BASIC-, and iterator-style for-loops: ::: for(init; condition; update) ::: for(var: start, end, inc) (inc is optional) ::: for(var: array); for(var: someYieldyCofunc) The inferred type of an array literal works as follows: ::: []T{...} = A static array of the given values ::: &[]T{...} = Pointer to the given static array ::: new [n]T = Pointer to a new (uninitialized) array allocation ::: new []T{...} = Pointer to a new array allocation (values copied from static array) Arrays with the (first) dimension omitted ([]T or [,n]T) are pointers. Arrays of the form [ , ,...] are rectangular (stored in one static allocation). Arrays of the form [][][]... are jagged (the "inner" arrays are stored as pointers). Arr[3..6] is a shorthand for the tuple expression (Arr[3], Arr[4], Arr[5]). Tuples will remain "auto-unpacked", and no tuple-variables allowed. Method receivers ("this") are ALWAYS (intrinsic) pointers. Methods may be defined within structs, just as in Java/C# (compact/familiar). There will be an "x@value" syntax for embedding variables within array literals. Addressing goes on the LHS and applies to the whole RHS (e.g. &a[n] is &(a[n])). Entities may not be defined within each other (e.g. no structs within structs, etc.). Expressions cannot contain statements (declarations, assignments, calls, var++/var--). Anonymous functions may may not refer to (non- static/const) external local variables. All code is precomputed as much as possible (without unrolling loops or recursive calls). The $ operator Requires something to be interpreted, including loops and recursive calls. Bridge methods will be inserted for multiple "inheritance" of anonymous fields, as needed. Control-flow constructs will indeed require parenthesis (avoids parsing conflicts with literals). No "static" members of anything (but static local vars and static initialization blocks are allowed). Entites Have Global Accessibility If They Are Capitalized, and namespace accessibility otherwise. Look-Up-Tables (rather than Jump Tables) will be used with switches and if-else chains as possible. Methods can only be defined for "identifier" types (structs, primitives, etc., but not funcs, arrays, etc.). Namespaces may be nested ("Outer.Inner" syntax), and there will be a "using Namespace" mechanism. Self-Modifying code will be used with cofuncs and switch-variables (Will consider an option to disable it). Explicit variable addresses can be nominal (@"address") or refer to another variable (@x or @arr[n].foo). No exception-handling or "try-catch" mechanism (use multiple return values or create an "Exit()" instead). Type-casts will be represented traditionally (e.g. "(byte)(a+b)"). "Extra" Parenthesis are not allowed within datatypes ("func(...)" requires them, but []*T are *[]T are unambiguous). Function pointers without any return values may point to functions with return values (e.g. func(byte) pointing to func(byte):byte). Values will be passed/returned in registers such that any two functions with the same pattern of arguments will use the same registers for them. Default arguments (and struct members) must come last, and will be embedded in functions so they can be pointed-to as their reduced versions. An anonymous (nameless) struct/interface/cofunc/func within a namespace will take on the name of the namespace (e.g. "List myList" rather than "ListNameSpace.ListStruct myList"). This also gives namespace values ("List.staticValue") the feel of Java/C#'s static class members. |
||
| 16 Apr 2012 02:42:24 pm by shkaboinka | Quote | |
| I just want to note that I've been updating/revising that list in my previous post (as well as cleaning it up and making it MORE READABLE) ... Any opinions? | ||
| 21 Apr 2012 02:58:08 pm by shkaboinka | Different Processors -> Datatypes | Quote |
| I am considering the possibility of having OPIA build for various platforms (z80, 68k, whatever the NSpire and CSX are), so I need to design with that option in mind (your input would greatly help; keep reading). This most directly affects datatypes:
One option would be to use the standard byte, short, int, long types (8, 16, 32, 64 bits). The up side is that type-sizes would be consistent across platforms. The down side is that it would feel goofy using "shorts" (and "bytes") for z80, and using other types from other platforms. The option I'm considering is to just use "byte" and "int", with "int" being whatever the "word size" is. Type sizes would not be consistent, but each processor is probably best suited to work with it's word-sized ints anyway. I don't know if I will actually make it for more than just the z80, but I do want to design so that this could happen easily and without affecting how the language is defined. IT WOULD HELP TREMENDOUSLY IF ANYONE COULD TELL ME WHAT SIZES OF VALUES THE DIFFERENT PROCESSORS WORK WITH, AND THE LIMITATIONS OF EACH (68k, nspire, casio) (e.g. the z80 works best with 8-bit values, but has a 16-bit word size and can work with them as well; though some operations require multiple instructions). ... I just want to be able to make informed decisions |
||
| 21 Apr 2012 03:39:10 pm by TheStorm | Quote | |
| If the overhead is not too great I would go the C route of having int's be the platform word and pointer size and then have a "uint8/16/32" like data type if you need exactly that many bits, the platform specific "headers"/implementation would then convert those to the needed data type. | ||
| 21 Apr 2012 04:58:52 pm by shkaboinka | Quote | |
| I could map explicit types like this:
int8, uint8, int16, uint16, int32, uint32, int64, uint64. char8, char16. bool (8 bits). byte (identical to uint8, but suggests a "raw data" connotation). Then I could have these plain (device-specific) types: int, uint, char, word (like byte, but identical to uint). I could then provide preprocessor directives to indicate what the word-size is (and the char-size), which will cause the plain types to be identical to their explicit counterparts (e.g. int == int16 for z80), as well as which types are allowed (e.g. I'd disallow 32 and 64 bit values on the z80). These could reside in environment declaration-blocks, so that all environment information would stay in one package (which most people will never need to look at). I'll also consider changing some of my directives ("environment =" to "#environment = ", and [to be more consistent with other language's conventions] "#for" to "#if", and maybe throw in an "#else"). Side note: does anybody know if any sizes (8/16 bit) are particularly difficult/limiting to work with on any of the graphing-calculator processors? (For example, perhaps one has a word size of 32 bits and some limited operations for 16 bit values, but very few operations for bytes)? ... If that's the case, I might have some values loaded into larger sized registers if that is more efficient (but still loaded/stored into appropriately sized memory locations) |
||
| 26 Apr 2012 12:24:02 pm by shkaboinka | Quote | |
| Ahem (??): int uint int8 uint8 int16 uint16 int32 uint32 int64 uint64 char char8 char16 byte word bool short long float double ... ??
That's suddenly a LOT of types in place of just (u)byte, (u)word, bool, char. ... Is that making things way too complicated in the name of universality? Is it worth making sure each device (not just z80) uses the most appropriate default size AND can access specific sizes? Is it too much of a hinderance to just stick with btyes and words, and know that more capable platforms will just have to work with those on the grounds that "Well, OPIA is for small embedded environments"? Would it be nasty to just have a fixed size setup, and have z80 programs stuck using bytes and "shorts" while other programs use ints (which hurts portability)? Would it be way too awesome to be able to have code that compiles straight-up from on device to the other, given that you change a flag or an include? |
||
| 27 Apr 2012 04:50:04 pm by seana11 | Re: Different Processors -> Datatypes | Quote | ||
/me thought that OPIA was designed to take advantage of specific features of the z80, and exploit its quirks in a way that C could not, because it was designed for multiple architectures. Prizm/fx9680 run on the SuperH architecture. |
||||
| 28 Apr 2012 01:06:20 am by shkaboinka | Re: Different Processors -> Datatypes | Quote | ||||
It still would; but portability would be awesome! |
||||||
| 08 May 2012 10:28:11 pm by shkaboinka | Quote | |
| ...Ok, I'll plan to keep it on the z80; but how about this setup for primitive datatypes:
byte, char, bool: unsigned 8-bit values sbyte: signed 8-bit values int: signed 16-bit values uint: unsigned 16-bit values Numeric Literals would be resolved (by default) as follows: 55: int (55i: int) 55u: uint 55b: byte 55s: sbyte Hexadecimal and Binary literals would depend on the number of digits for how it resolves (by default), but explicit indicators can be used as well: 0x1: byte (1 or 2 digits) 0x001: uint (3 or 4 digits) 0x001b: byte (b indicator) 0x1i: int (i indicator) 0b1: byte (8 or less digits) 0b000000001: uint (8+ digits) 0b1u uint (u indicator) ...Note that those are DEFAULT evaluations (e.g. "X := 5" makes X an int); but a clear context may result in a different type (e.g. perhaps the compiler can tell that looping from 1 to 10 only needs a byte). For string literals: "null terminated string" r"raw string literal" b"byte-prefixed string" u"uint-prefixed string" ... How does that all sound? |
||
| 18 May 2012 12:36:38 pm by elfprince13 | Quote | |||
Assuming you're building a proper AST, have the node-type for a numeric literal be the smallest data type that will hold it. I assume casts to increase precision/data-size are already implicit? The only tricky spot is when doing bitmath, and if your parser will differentiate between 0x00ff and 0xff when assign datatypes in the AST, I don't think that's a problem. |
||||
| 20 May 2012 10:54:06 pm by shkaboinka | Quote | |||||
I think I'll have the default be the full int-size; however, I also intend to have the compiler "optimize" by using the smallest size that it can determine is necessary (when possible). What I've done before when parsing expressions is to let the size be unspecified at each level until/unless an inner value was clear or the outer context demanded a certain size anyway. I think I'll do that again, but instead use a default int-size, but narrow it where possible. Of course, known values will result in more direct optimizations. As for differentiating between 0x00ff and 0xff, I figured that since it is represented in bits (well, nibbles) anyway, that it is pretty clear what's what; though a suffix could still be used. If I can, I will also try to use the carry- (or other)-flag as an extra bit between operations if/when it can be determined that it might hold extra information (e.g. (MAX_INT * 2)/c). |
||||||
| 24 May 2012 05:34:36 pm by shkaboinka | Quote | |||
Oops...
I cannot allow those indicators (suffixes) on hexadecimal representations, because 'b' is a valid digit. I'll just require hexadecimal and binary representations to be explicit; though type casts can be used instead: 0x1 // byte int{0x1} // int 0x001 // unit sbyte{0x001} // sbyte ALSO, Should I use u"string" to indicate a uint-prefixed string, or should I use i"string" for (i for int, though it would stay unsigned) because people will likely think in terms of bytes and ints (and so that people do not think they are looking at a u"unicode string")? ... I could allow either, but then there would be inconsistency. Is this still better than something like 1"this" or 2"this"? |
||||
| 24 May 2012 05:43:03 pm by merthsoft | Quote | |||||
Why not make it based on context? In C# if you do:
It has no issues, but if you do:
It says "Constant value '1048831' cannot be converted to a 'byte'". |
||||||
| 24 May 2012 06:26:49 pm by shkaboinka | Quote | |||
| Merth: that will most definitely be the overriding case. But there are instances where the type needs to be determined from the value itself:
overloadedFunc(0x1); x := 0x1; // define x in terms of the initial value. I JUST FOUND A MAJOR UH-OH which means that I need to CHANGE SYNTAX AGAIN!
...Basically, the options consist of switching sides and choosing between matching in appearance of order or matching in order of evaluation; using a different symbol; and perhaps NOT moving the indexing [] to the left. |
||||
| 24 May 2012 10:32:46 pm by merthsoft | Quote | |||
x will be an int. x := (byte)0x1; will make it a byte. I'm fine with casting. |
||||
| 25 May 2012 01:55:39 am by shkaboinka | Quote | |||||
I agree with using "int" by default (unless context specifies otherwise), and that casts are not burdensome; but I suppose I also wanted to provide those suffixes as short-hands, since other languages allow 55L for "long" or 1.0f for "float", etc. However, I'm wanting bit-literals (hexadecimal and binary) to be determinable by the number of digits, since (1) you CAN provide digit-per-bit information (that's what a mask is anyway), and (2) "b" is already a valid hexadecimal digit, and therefor I'd like not to allow those suffixes with bit-literals: w := 123; // int x := 123b; // byte, etc. (u is uint, s is sbyte) y := 0x01; // (or even 0x1) byte z := 0x0001; // (or 0x001) uint |
||||||
| 25 May 2012 10:43:26 pm by shkaboinka | Quote | |||
| Maybe I can do away with those suffixes, since (1) it's round-about to declare a variable based on the value AND sneak the type in (might as well just declare the type); and the only other place I can think of where this is helpful is with overloaded functions, which can be resolved with type-casts anyway. Removing them would simplify the compiler and the language. However, the default value for numbers will be int, and I will probably keep the same rules for bitwise values (byte or uint, depending on the number of digits).
FOR THE TYPE ISSUE, I will just move the dereferecing to the left, and always have array-indexing bind first:
|
||||
| 28 May 2012 06:41:52 pm by shkaboinka | Quote | |||
| QUESTION TO ALL: If there was one thing you could add to (or change) about OPIA, what would it be?
EDIT: The main response (on SAX) has been a request for lamba expressions (a short syntax for anonymous functions):
I'd probably go with [2], because it really does look like a short form of a full anonymous function declaration; because ":" is already in the language, and "=>" is not; and because "func" makes it clear what it really is (I always thought the traditional lambda syntax looked a bit messy). However, because "func(foo):bar" also resembles a function-pointer (and would therefor be highly context dependent), [4] might be a better option, but seems less clear than something with "=>" or "func" in it. Note that no datatypes are specified for arguments or return values in these lambda expressions. This is because they are meant to be short forms that can be easily inserted in expressions, and the intent is that the types would be taken from context. |
||||
| 28 May 2012 08:06:31 pm by merthsoft | Quote | |
| I prefer 1, 2, or 4. 4 seems like a good compromise. | ||
| 28 May 2012 09:25:54 pm by shkaboinka | Quote | |
| I am leaning heavily toward [1] because it's the easiest to spot (very standard), is very readable ((A)=>B practically says itself), the smallest (which is the whole point), and is hard to confuse with anything else. ... This kind of lambda saves at LEAST 13 characters, and even more after removing data-types! (Thanks, Merthsoft and Benryves, for suggesting lambdas) |
||
| New Post | New Topic |
| Goto page Previous 1, 2, 3 ... 12, 13, 14 ... 18, 19, 20 Next | |
[Switch to Desktop view]
© Copyright 2000-2013 Cemetech & Kerm Martian :: Mobile Design by Alex "comicIDIOT" Glanville
Problems? Issues? Or Suggestions? There's a thread for that!
© Copyright 2000-2013 Cemetech & Kerm Martian :: Mobile Design by Alex "comicIDIOT" Glanville
Problems? Issues? Or Suggestions? There's a thread for that!
