Ashbad wrote:
- What are the purposes of .o files in SDCC and/or GCC?
- How do linker files work? What exactly do they do? How do you specify them and set them up in GCC and/or SDCC?
- How can you specify the beginning address from which code starts? Like .org $9D95 in general-assemblers for z80 assembly?
- How can you specify links to syscalls that an OS already has?
- What is the point of crt0.s? I saw SDCC and GCC have one.
These are all closely related. Simply, the compiler itself generates an intermediate file (.o) which includes the generated code as well as a bunch of other useful information. The linker takes a linker 'file' (more usually, a 'linker script') and generates a file which can be directly executed on the target system. Using the Prizm GCC toolchain as an example, let's make a file nothing.c:Code: const int foo[] = {0, 1, 2, 3};
int main(int argc, char **argv) {
volatile int i = foo[2];
// Do-nothing loop
for (i = 0; i < 4096; i++);
return 0;
}
Start by compiling to an intermediate file (-c to compile only, no linking), and we'll have the compiler put it in nothing.o:
Code: $ sh3eb-elf-gcc -c -mb -m4a-nofpu -o nothing.o nothing.c
We can examine the contents of the output file (which is ELF for this particular configuration of the compiler) with objdump (we'll use -hs to show the file headers and contents of sections):
Code: $ sh3eb-elf-objdump -hs nothing.o
nothing.o: file format elf32-sh
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000050 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 00000000 00000000 00000084 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 00000084 2**0
ALLOC
3 .rodata 00000010 00000000 00000000 00000084 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 00000012 00000000 00000000 00000094 2**0
CONTENTS, READONLY
Contents of section .text:
0000 7ff461f3 71cc114e 61f371cc 115dd10f ..a.q..Na.q..]..
0010 521261f3 71cc112f 61f371cc e200112f R.a.q../a.q..../
0020 a0080009 61f371cc 511f6213 720161f3 ....a.q.Q.b.r.a.
0030 71cc112f 61f371cc 521f9106 32178bf1 q../a.q.R...2...
0040 e1006013 7f0c000b 00090fff 00000000 ..`.............
Contents of section .rodata:
0000 00000000 00000001 00000002 00000003 ................
Contents of section .comment:
0000 00474343 3a202847 4e552920 342e362e .GCC: (GNU) 4.6.
0010 3000 0.
We see five sections in this file. .text is the section that contains executable code, .data contains writable data which will be initialized at runtime, .rodata is like .data but not written to, .bss is scratch space which will be initialized to zero at runtime, and .comment is an informative section which tells anything that might read this ELF file that it was generated by GCC 4.6.0 (in this case, anyway. It might contain anything the compiler wants to include). The second line of each section's description shows what flags are set on that section. We can see that .text is flagged CODE and READONLY, for example, and both it and .data are flagged LOAD, meaning they'll be loaded straight into memory before execution begins.
The VMA and LMA columns in objdump's output indicate where in memory each section is expected to be loaded. Since we only had gcc compile this for us and didn't link, all the section addresses are at their defaults, 0. We can see what will change with linking by looking at the relocations:
Code: $ sh3eb-elf-objdump -r nothing.o
nothing.o: file format elf32-sh
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000004c R_SH_DIR32 _foo
This tells us that there's a value at .text+0x4c bytes which is a 32-bit integer used to refer to the address of the symbol _foo (that is, our const array). With that information, when we link and the absolute addresses of sections change, the linker can change the value at that location so the code points to the right places and runs correctly. Looking at the dump of .text above, we see that the current value of that is 0x00000000.
Now let's link this into a binary that we could run on the Prizm. Here's the linker script for reference:Code: OUTPUT_FORMAT(binary)
OUTPUT_ARCH(sh3)
/* Entry point. Not really important here, since doing binary output */
ENTRY(initialize)
MEMORY
{
/* Loads code at 300000, skips g3a header */
rom (rx) : o = 0x00300000, l = 512k
ram (rwx) : o = 0x08100004, l = 64k /* pretty safe guess */
}
SECTIONS
{
/* Code, in ROM */
.text : {
*(.pretext) /* init stuff */
*(.text)
*(.text.*)
} > rom
/* Read-only data, in ROM */
.rodata : {
*(.rodata)
*(.rodata.*)
} > rom
/* RW initialized data, VMA in RAM but LMA in ROM */
.data : {
_bdata = . ;
*(.data)
*(.data.*);
_edata = . ;
} >ram AT>rom
/* Uninitialized data (fill with 0), in RAM */
.bss : {
_bbss = . ;
*(.bss) *(COMMON);
_ebss = . ;
} >ram
}
Run the linker:
Code: $ sh3eb-elf-ld -T prizm.ld -o nothing.bin nothing.o
bin\sh3eb-elf-ld.bfd.exe: warning: cannot find entry symbol initialize; defaulting to 0000000000300000
We should fix up that warning. Looking at the linker script, we see that the entry point (where execution begins) is set to the the symbol 'initialize' (line 5). We didn't see such a symbol in the ELF file the compiler gave us, though. That's where crt0 comes in. crt0 is the runtime initialization code. It takes care of setting up memory and the machine's state in general so the C code can take over with the machine in the state it wants. I won't waste more space by pasting the whole of our Prizm crt0 here, but it defines a .pretext section and a few other symbols, including initialize. When it finishes initializing things (copy the original contents of .data into RAM, zero out .bss..), it calls our C code's main function and handles cleaning up the environment when that function returns.
Looking back at the linker script, it says that the .text section in the output file should consist of the input .pretext and .text sections from the inputs, in that order. Due to how crt0 is set up, that puts initialize at the very beginning. The output .text and .rodata sections are to be placed in the memory region rom (> rom), defined towards the top of the file as a 512k chunk of read-only (and executable) memory at 0x00300000. .data gets its VMA in ram (>ram), but the LMA in rom (AT>rom), meaning the actual data is to be found in rom, but the actual code expects to find the contents of that in ram (so crt0 is expected to copy that into RAM beforehand).
If you were to build crt0 into an object file in the same way we did out .c file earlier and add it as an argument to ld, that will include all the code (nothing.c and the requisite crt0) in the output. I leave that as an exercise.
This post is getting pretty long, so I won't do this, but I highly recommend linking again but outputting another ELF file instead of the straight binary we usually output with prizm.ld (just change the first line to read OUTPUT_FORMAT(sh-elf)), then checking it out with objdump, allowing you to see how the section addresses have changed. You may also want to try objdump -d (d for disassemble) to have a look at the machine code it generated. Consider it another exercise.
When linking against OS libraries, you either have a known address they can be found at, or (more usually) the system provides a dynamic linker which handles all the relocations at runtime, after loading your shared libraries.
Ashbad wrote:
- A bit unrelated, but what is a .a static library file?
Just a file containing one or more packed object files which the compiler can pull out and embed in your object file as needed.
Ashbad wrote:
- (SDCC only) how can you specify a .C file to be compiled into intel hex format (.hex) so it can be used for something else in a z80 toolchain (like convert it to a certain program format)? I tried doing something like:
Code: SDCC -c source.c -o source.hex -intel
It made literally ALL other files besides the .hex. And, how to I specify SDCC to compile to c99 standards? I suggest you RTFM. If I had to hazard a guess, it doesn't work because sdcc doesn't support what you're trying to make it do, but I haven't actually looked at the manual.