So I started coding x86 assembly a week ago and I tried some size comparison of the output programs in both languages.

First of all, a simple program that displays "Hello".

C Code

Code:
#include <stdio.h>

main()
{
    printf("Hello World\n");

}


x86 ASM Code

Code:

section .data
    hello:     db 'Hello World',10           ; 'Hello' plus a linefeed character
    helloLen:  equ $-hello             ; Length of the 'Hello' string
   
section .text
    global _start

_start:
    mov eax,4            ; The system call for write (sys_write)
    mov ebx,1            ; File descriptor 1 - standard output

    mov ecx,hello        ; Move the string 'Hello' to ecx to be printed
    mov edx,helloLen     ; Move the length of 'Hello' to edx to be printed

    int 80h              ; Display Hello
   
    mov eax,1            ; The system call for exit (sys_exit)
    mov ebx,0            ; Exit with return code of 0 (no error)
    int 80h              ; Exit program


Size of the output in C: 8.2KB
Size of the output in ASM: 1.2KB

I thought the difference was huge and tried another way to do it in C (more low level way):


Code:
main()
{
    write(1,"Hello World\n",12);
}


12 is the number of bytes (I counted them, to make it smaller)

Size of the output: 8.2KB

The size is exactly the same, probably because GCC does all the Maths before creating the final program and that makes it have the same size.

Then I tried a program that loops forever "Hello" on different lines.

x86 ASM Code

Code:
section .data
    hello:     db 'Hello',10           ; 'Hello' plus a linefeed character
    helloLen:  equ $-hello             ; Length of the 'Hello' string
   
section .text
    global _start

_start:
    mov eax,4            ; The system call for write (sys_write)
    mov ebx,1            ; File descriptor 1 - standard output

    mov ecx,hello        ; Move the string 'Hello' to ecx to be printed
    mov edx,helloLen     ; Move the length of 'Hello' to edx to be printed

    int 80h              ; Display Hello
    jnz _start
   
    mov eax,1            ; The system call for exit (sys_exit)
    mov ebx,0            ; Exit with return code of 0 (no error)
    int 80h              ; Exit program


C Code

Code:

main()
{
    while (1) {
        write(1,"Hello\n",6);
    }
}


Output of C: 8.2KB
Output of ASM: 1.3KB

I really don't know if I did anything wrong in my experiments. Any conclusions on your side or opinions? Thanks
Good experiment, but I think it'll be more valid if you use Asm() tags in the C program to have your assembly inline, and see what it does size-wise. If it's still 8.2 K that way that means its including more than just the code to it; maybe it's formatted with a header similar to a .exe one (though, you use Linux, right?)

Other than that, no idea, I'm not the one to really know. Razz
I was going to suggest something similar, I think that C Programs automatically have something to close open file pointers / free() malloc'd memory. Probably wrong, but I don't know.
I'm curious what file type that is to have a 1.3 KB file with what is probably less than a hundred bytes of actual ASM code*.

*Plus the initialization stuff.
Not to mention, linking. The C program might have the libraries fully included in the source. Try linking it to a shared library.
I made all of this on Linux, the output file for ASM is an executable file (I mean like .out's) and so is the C executable.

I used GCC and NASM.
Try out Ashbad's idea, C program with asm(), and also benryves', who had said to check the disassembly of the C program.
Strip and optimize your C binaries, and your binary should get smaller (I don't know offhand what sort of optimizations GCC does without being asked):

Code:
$ gcc -Os -s foo.c

That probably won't optimize much, but stripping should help.

There's a fair amount of overhead in the runtime setup and teardown you use with the main interface. Change _start to main in your assembly program and it should get much larger, since you're linking against crt0 (or similar). ELF headers and such (which the system demands to make things runnable) consume a reasonable chunk of space as well.

For fun with breaking specifications (and the more general stuff I mentioned here), have a look at creating teensy ELF executables.
Hello World in 142 bytes: http://timelessname.com/elfbin/
There is indeed a lot of wrapping and safety that your C compiler is adding. Here's a fun experiment:

1) run gdb yourexecutablename
2) Set a breakpoint on the first line of main()
3) Run the "run" command to start the program and trigger the breakpoint
4) Do "disas" and check out the code your C compiler generated.
Try this C program:

Code:
#include <unistd.h>

void _start()
{
        write(1,"Hello World\n",12);
        _exit(0);
}


Compile with "gcc -s -nostartfiles helloworld.c". On my system I got a 1816-byte program.

The -nostartfiles removes the C runtime code that runs before main() and adds quite a bit of overhead. Also, if you can, add the option "-static" to make it a statically-linked binary. The relocation information probably adds a lot of overhead (relatively speaking) too.

EDIT:
_player1537 wrote:
I was going to suggest something similar, I think that C Programs automatically have something to close open file pointers / free() malloc'd memory. Probably wrong, but I don't know.

You're partially right. Programs on Linux don't have to free any memory when they exit; all their memory is reclaimed by the kernel regardless. The C runtime code does close or at least flush open files when the program exits.

Fun fact: the exit() function that you call from C is not the direct system call. C's version cleans up (such as flushing all open files) and calls the functions that were registered with atexit(), and then it calls the exit system call. The _exit() or _Exit() function is the real direct system call, the same that you would call from asm.
Of course, this is all purely academic. You should *NEVER* try and prevent the C setup/destroy routines from running. The file size reduction really isn't worth it (actually, the file size reduction is largely non-existent. Your 1.2kb assembly version is actually 4kb on most file systems anyway)
Kllrnohj wrote:
Of course, this is all purely academic. You should *NEVER* try and prevent the C setup/destroy routines from running. The file size reduction really isn't worth it (actually, the file size reduction is largely non-existent. Your 1.2kb assembly version is actually 4kb on most file systems anyway)

You're right. There's nothing really practical about bypassing the C runtime like this (unless one is writing a replacement for libc Smile). I personally believe that it's important to understand what is going on under the hood, though.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement