Hello world in C inline assembly (2018)

10 days ago (jameshfisher.com)

As other comments have noted, the asm statement needs to have its input/output registers specified to ensure the compiler doesn't erase the "unused" values.

Working example: https://john-millikin.com/unix-syscalls#linux-x86-64-gnu-c

Adapted to use main():

  static const int STDOUT = 1;
  static const int SYSCALL_WRITE = 1;
  static const char message[] = "Hello, world!\n";
  static const int message_len = sizeof(message);

  int main() {
   register int         rax __asm__ ("rax") = SYSCALL_WRITE;
   register int         rdi __asm__ ("rdi") = STDOUT;
   register const char *rsi __asm__ ("rsi") = message;
   register int         rdx __asm__ ("rdx") = message_len;
   __asm__ __volatile__ ("syscall"
    : "+r" (rax)
    : "r" (rax), "r" (rdi), "r" (rsi), "r" (rdx)
    : "rcx", "r11");
   return 0;
  }

Test with:

  $ gcc -o hello hello.c
  $ ./hello
  Hello, world!

  • Or just

      int main(void) {
        asm volatile("syscall" : : "a"(1), "d"(14), "D"(1), "S"("hello world!\n"));
        return 0;
      }
    

    Though the clobber list is weak spot, I don't know exactly what it should have in this case.

> This C program doesn’t use any C standard library functions.

This is only half true. While the code doesn't call any stdlib functions, it still relies on the the c stdlib and runtime in order to get called and properly exit.

I'm somewhat perplexed why the author did do it with the runtime, given that he doesn't really depend on features of it (except maybe the automatic exit code handling) instead of building with -ffreestanding.

Actually more readable than the AT&T syntax :)

But does this work on both GCC and Clang, and is safe from being optimized away? edit: the answer is no

Turbo Pascal had an integrated assembler that could use symbols (and even complex types) defined anywhere in the program, like this:

    procedure HelloWorld; assembler;
    const Message: String = 'Hello, world!'^M^J;  {Msg+CR+LF}
    asm
        mov  ah,$40  {DOS system call number for write}
        mov  bx,1    {standard output}
        xor  ch,ch   {clear high byte of length}
        mov  cl,Message.byte[0]
        mov  dx,offset Message+1
        int  $21
    end;

  • Not only Turbo Pascal, this more sane approach to inline Assembly was quite common in the PC world compilers, regardless of the programming language.

  • Thanks for making me extremely sentimental for the hundreds of Turbo Pascal projects I did back in the day - this particular example highlights the elegance and clarity of the language, which we still seem to resist in our modern tooling.

    • I don't really see what's "elegant" about the code, could you elaborate? (This isn't a jab at GP. I'm just curious about what I'm not seeing.)

      4 replies →

When I compile it with GCC 12, this machine code results:

    1129:       f3 0f 1e fa             endbr64 
    112d:       55                      push   rbp
    112e:       48 89 e5                mov    rbp,rsp
    1131:       b8 01 00 00 00          mov    eax,0x1
    1136:       bf 01 00 00 00          mov    edi,0x1
    113b:       48 8d 05 c2 0e 00 00    lea    rax,[rip+0xec2]        # 2004 <_IO_stdin_used+0x4>
    1142:       48 89 c6                mov    rsi,rax
    1145:       ba 0f 00 00 00          mov    edx,0xf
    114a:       0f 05                   syscall 
    114c:       b8 00 00 00 00          mov    eax,0x0
    1151:       5d                      pop    rbp
    1152:       c3                      ret    

Can you spot the error?

. . . . . .

The code biffs rax when it loads the string address, so the system call number is lost, and the code ends up not printing anything. Moving the string assignment to be the very first line in main fixes it.

BTW, Clang 14 with no optimization accepts the code without issue but compiles it without using any of the registers; it just stores the values to memory locations and runs the syscall opcode. With O1 optimization or higher, it optimizes away everything except the syscall opcode.

  • The exact same thing happens with GCC 12 with 32-bit MIPS.

      #include <asm/unistd.h>
       
      char msg[] = "hello, world!\n";
       
      int main(void)
      {
          register int syscall_no asm("v0") = __NR_write;
          register int arg1       asm("a0") = 1;
          register char *arg2     asm("a1") = msg;
          register int arg3       asm("a2") = sizeof(msg) - 1;
       
          asm("syscall");
       
          return 0;
      }
    
      root@OpenWrt:~# objdump --disassemble=main
      ...
      00400580 <main>:
        400580: 27bdfff8  addiu sp,sp,-8
        400584: afbe0004  sw s8,4(sp)
        400588: 03a0f025  move s8,sp
        40058c: 24020fa4  li v0,4004
        400590: 24040001  li a0,1
        400594: 3c020041  lui v0,0x41
        400598: 24450650  addiu a1,v0,1616
        40059c: 2406000e  li a2,14
        4005a0: 0000000c  syscall

  • With an older version, it works (as long as there is no optimization at least, with -O2 all the register init code disappears):

    $ gcc -v

    ... gcc version 10.2.1 20210110 (Debian 10.2.1-6)

        0000000000001125 <main>:
            1125: 55                    push   %rbp
            1126: 48 89 e5              mov    %rsp,%rbp
            1129: b8 01 00 00 00        mov    $0x1,%eax
            112e: bf 01 00 00 00        mov    $0x1,%edi
            1133: 48 8d 35 ca 0e 00 00  lea    0xeca(%rip),%rsi        # 2004 <_IO_stdin_used+0x4>
            113a: ba 0e 00 00 00        mov    $0xe,%edx
            113f: 0f 05                 syscall 
    

    No idea why a newer version produces worse code in this case (though of course, this way of doing inline assembly isn't "correct" anyway, so nasal demons may result)

Never seen inline assembly written quite like that, is this actually correct code? I'm concerned that normally register annotation is just a hint, and that the assembly blocks are not marked volatile - and that the compiler may therefore be free to rewrite this code in many breaking ways.

Edit: Ah a basic asm blocks is implicitly volatile. I'm still a little concerned the compiler could get clever and decide the register variables are unused and optimize them out.

  • Tried it with GCC, and without any optimization it does print the message. With "-O2" however, we get this:

        Disassembly of section .text:
        
        0000000000001040 <main>:
            1040: 0f 05                 syscall 
            1042: 31 c0                 xor    %eax,%eax
            1044: c3                    retq   
    

    Everything except the syscall instruction has been optimized away!

    • Now that's incredibly cursed. Could do basically anything and swallows the error too!

  • I think that named register variables (a GCC extension) are meant to be live in asm block by design, so they shouldn't be optimized away.

    Still I would use extended asm.

    edit: from the docs: "The only supported use for [Specifying Registers for Local Variables] is to specify registers for input and output operands when calling Extended asm".

    So the example is UB.

    • It's not UB, it's documented behaviour of a vendor extension.

      It's not UB because it's defined as outside the scope of the language standard. The vendor (in this case, GCC) does document how to use its inline assembly extension in quite a lot of detail, including how to use clobber lists to prevent exactly the kind of thing these failures demonstrate.

      5 replies →

The `return 0;` is optional for main() in C, so the function body could be made to consist solely of inline assembly.

Is anyone aware of a similar example, for ARM assembly on macOS?