As some of know, I've been working on a linker for the eZ80 model calcs, whose source code you can find on GitHub. Right now, the linker only supports parsing Zilog's object files; it can't actually link yet. However, since I can fully parse object files, it's time write the actual linking code. There's also been a request to port the linker into C. Since Zilog's own compiler is Windows-only, I don't see the advantage. But it's something I'm considering.

There are three main goals here: First, to allow LibLoad libraries to be written in C. Second, to allow the production of relocatable programs. This would allow programs to relocated into flash for execution. It would also writing programs larger than 64 K; the on-calc loader would resolve the references between subprograms automatically (think Lua). Finally, it would allow the production of apps. (I still haven't sent that email. There will be no patches until TI gets a chance to respond.)

EDIT: Apparently, Mateo thought I was re-doing the calculator-side stuff. Not at all. The current calculator-side stuff is fine. But, the current system of building libraries has some convoluted hacks in it, and a hack is also used to allow C and assembly programs to consume LibLoad libraries. The first goal of the new linker is to simplify the linking process and provide more flexibility by having the linker natively support our formats. With a linker that natively supports our formats, we can also consider implementing shells (maybe DCSE, Kerm :p) that support relocatable programs and programs larger than 64 K.

As I understand it, the linking process is conceptually simple: Load the object files, collate the sections from individual object files, order the sections, fix their final addresses, resolve intramodule relocations, resolve intermodule relocations, and package the data into the proper output format. If the desired output itself should be relocatable, then the resolved references will be relative.

The IEEE695 OMF format Zilog uses is very flexible. In fact, it's far more flexible than is useful to us. Mercifully, although the OMF format allows data records to occur out-of-order, it appears that Zilog doesn't do that. If that were the case, I'd have to build a database of data records to be able to collate them. But cheating and just assuming records come in-order is much easier. I plan on cheating.

I'm going to have to study the LibLoad format in more detail so I can generate it. I also don't feel like figuring out the details of Zilog's .LIB format, because it looks like the .LIB generator has its own special hacked-together code for generating OMF files, and it's ugly. So I'd like to just parse the object files from a library, link them myself (and skip that awkward Spasm step), and generate my own dummy format for use with my linker so it can add the correct LibLoad information to programs using LibLoad. Essentially, generating a library would be exactly like generating a regular program, except that in addition to a .8xv LibLoad appvar, you also get a library file for use with my linker.

I'm told that jacobly knows how the current library building system works. Mateo seemed to indicate that the .LIB file was critical to programs using LibLoad (something about generating the right jump table for the program using the library), but since I'm writing my own linker, I don't see why it wouldn't be desirable to use my own format. Maybe I don't understand things well-enough.

Comments and guidance would be helpful.
I've started a sort of v2 of the OMF parser that's simplified to deal with Zilog's stuff specifically. Additionally, I've abstracted OMF out into a generic object module class, to allow support for allowing other input formats from other tool chains.

Zilog's compiler's use of relocation expressions has some idiosyncrasies. Most of the time, it outputs a 24-bit address as three 8-bit expressions, which produce the three bytes one-by-one, using a shift-and-mask stereotype. However, it also sometimes produces a 24-bit expression. Strangely, Zilog specifies the file as being in big-endian mode. Which means that the 24-bit expression also reverses its own endianness!

Also, I've found that Zilog rarely, but nevertheless sometimes, does use more than just basic addition in their relocation expressions, which means that I actually have to have the object file class support having real relocation expressions. Unfortunately, this means that any relocatable output formats the linker supports will need to have their own special code to unobfuscate Zilog's expressions. I'll work on seeing how I can make that less awkward, though.

Currently, the linker can load and parse an OMF file, but it doesn't yet support relocations. Once I'm happy with how relocations are supported, I'll have to start on the actual linking phase, the first task of which is computing final address/offsets for the sections collated from different object files. And, of course, I'll have to figure out how the parser gets the information on where to place sections.
It turns out that Zilog's .lib format isn't completely inexcusably awful. A .lib file contains a bunch of subfile OMF files, which are stored verbatim. The .lib file header is a wrapper that looks like an OMF file header, but is actually just a C-style struct with some funny padding bytes every now and then. Because the .lib file header is actually a struct, it's limited to having 100 input object files, because the struct only has 100 slots. But the struct format is still an awful thing to do in an OMF file.

The .NET platform's library of basic data structs is decent enough to make the actual linking logic fairly easy to write. It's probably algorithmically slow, but I suspect we won't ever have object files large enough for that to be a problem. With that realization, I've found that before I can continue work on linking logic, I have to process input arguments.

The current first goal for output support is to support writing LibLoad libraries in C. Currently, this can only be done in assembly, because the programmer had to tag every relocation manually. Additionally, the assembly process had to be done with Spasm, because it did some evil macro stuff to generate the relocations list at assembly time.

To simplify the process of creating LibLoad libraries, I've come up with the idea of a LibLoad Stub file, which just associates the external names of library functions with their internal function names and their external function numbers. The stub file is like a header file, basically. It essentially performs the same function as the function() macro used in the Spasm build process, except there are no macros needed, because the linker generates the function table data based on the stub file. Moreover, the linker extracts relocation information from Zilog's OMF binaries, instead of needing them to be tagged manually. Thus, the linker supports both C and assembly equally well. The old system with awkward Spasm macros can still be used, but if you use Zilog's assembler, you won't have to remember to tag every relocation.

So here's what I'm going with:
Code:
ezcalclink [options] [input files]
Options:
  --8xp, --app, --makelib [<stub file name> <.lib file name>]
    Specifies output mode: statically linked 8xp, TI app (if TI ever stops being obstinate about that), or LibLoad library.
    For LibLoad libraries, specifies the name of the stub file for a library, and the name of the .lib file to produce
  --outfile <name>
    Specifies output file name
  --static <address> <section name>[,<section name 2>[,<section name 3>[,&c.]]]
    Specifies that <section name> should appear at <address>.
    If more than one section name is given, then each additional section is located sequentially.
  --relocatable <section name>[,<section name 2>[,<section name 3>[,&c.]]]
    Specifies that <section name> is relocatable.
    If more than one section name is given, then each additional section is located sequentially.
    If more than one block of relocable sections is given, it's an error.


So I'm currently working on input arguments parsing. Hopefully, then I can finally get back to the linking logic.
Quote:
It's probably algorithmically slow, but I suspect we won't ever have object files large enough for that to be a problem.

Hmm. Forgive my potentially unwanted interference, all the more I'm largely an outsider to the TI-eZ80 development community, but I'd like to tell a related story, from the TI community, where a pretty similar line of thinking yielded a suboptimal outcome Smile

About a decade ago, in the "new", optimizing linker for the TI-68k development environment, all data structures were O(n) linked lists, instead of O(log n) trees or O(1) amortized hash tables. The main reason for that choice given by the linker's makers was coding simplicity, and low enough complexity of the usual programs.
As a matter of fact, from a user's POV, the efficiency of linked lists remained acceptable, even on then-contemporary computers, for the usual <64 KB programs: C compiling time is usually larger than link time, which is usually under a second anyway, so shaving several hundreds of ms over link time wouldn't make builds that much faster. The data structure choice was therefore arguably suboptimal, but quite far from irresponsible, for the linker's main use case.
However, the story was quite different for "large" programs made from "many" sections and symbols. PpHd reported insane times - a dozen minutes ! - linking the full version of PedroM, with PpHd's math stuff, which was several hundreds of KBs. While attempting to optimize the program (reordering sections, cutting ranges - from a user's POV, the set of optimizations is nifty), the linker was essentially spending its time traversing its linked lists. Obviously, a '2016 i7-6700HQ or i7-6700K would yield better wall clock times than PpHd's computer, which already wasn't run-of-the-mill back then - but still well in the minutes range, which is not really acceptable Smile
A more complex version of christop's Punix, or FlashApps (had the TI-68k FlashApp support ever been added to the linker, it never was due to abysmally low number of current and potential users), would have hit the same issue.

And yet, that linker is written in C, without external dependencies, rather than in:
* one of the higher-level languages not usually going through AoT compilation to native code which existed at the time, mainly Java and the .Net family - neither of which has a lightweight execution environment preinstalled on (any, most) of the platforms targeted by the linker;
* obviously, the 2010s languages compiled to native code but aimed at better safety: mainly Go, Swift (still not suitably portable), Rust - which were not even in the works at the time, at least for the latter two.

Switching to a RB-tree for the data structure(s) where it mattered is on GCC4TI's todo/wish list, and I had mostly chosen a GPLv2'ed RB-tree license for that purpose - IIRC, the Linux kernel's. I didn't go further, because the todo/wish list contained, and partially still contains, other higher-profile (and easier) bugfixes and improvements, and also (mainly ?), shortly thereafter, I became the libti*/gfm/tilp maintainer. Those were not really my favorite projects to work on, I'd have found it more fun to work on GCC4TI, whose code base I knew better anyway. However, libti*/gfm/tilp are useful to more people than a narrow set of TI-68k programmers, and remain useful, to date...


TL;DR: I think you should attempt to avoid data structures (and programming languages + execution environments, but you've already partially covered that by presumably not using a scripting, dynamically typed, non-JIT'ed language on top of the .Net platform) which could predictably hamper the making of large programs, such as third-party OS or FlashApps (not that TI will let us make them officially, but it's not like we should pay attention to their wishes), for the TI-eZ80 series Smile
"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."
---Donald Knuth

I imagine that the C code you're talking about has lots of loops that look like this:
Code:
for (nodeStruct *thingy = first; thingy.next != null; thingy = thingy.next) { do_stuff(); }
It's so easy to write a lot of code like that. And naturally, refactoring that to use trees would be a major PITA.

But I'm the post child for abstraction, and I don't have to write my own data structures. .NET's Dictionary class is hash-based. So maybe it'll eat 50 MB of RAM to link a 64 K program. I'd prefer to spend my time debugging a linker, not reinventing the same wheel that's been invented a dozen times before.

Maybe, once it's working well, a C++ port would be in order. At the moment, Zilog's compiler is already Windows-only, which comes with the .NET runtime (unless you're using something terribly outdated). I happen to be more familiar with C# than The Mother of All Memory Leaks. So tell you what, I'll do a C++ port if LLVM ever manages to generate valid eZ80 code. In the mean time, I'm going to focus on writing a program that does a task whose purpose most programmers can't even recall.

EDIT: I didn't mean for that to sound a little a flame-y. My point is simply that I feel it's too early to worry about optimization, and that with good structure, refactoring code is easy.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement