Hello World
First Program
The first program to write is to print Hello World!. Let's write the first RISC-V assembly code:
.data
hello: .asciz "Hello World!\n"
.text
.global _start
_start:
# invoke write(1, hello, 13) system call
li a0, 1
la a1, hello
li a2, 13
li a7, 64
ecall
# invoke exit(0) system call
li a0, 0
li a7, 93
ecallAt first glance, this code may look cryptic, but we'll demystify it step by step!
Now, let's compile it with the cross-compiler.
$ riscv64-linux-gnu-as hello.S -o hello.o
$ riscv64-linux-gnu-ld hello.o -o hello.outFinally, we can execute our first program.
$ ./hello.out
Hello World!System Calls
In short, what we've done is just invoke system calls.
The main task of our program is printing a string. To print a string, we have to somehow interact with I/O devices. However, in this example, we used the write system call instead of accessing the devices directly.
# invoke write(1, hello, 13) system call
li a0, 1
la a1, hello
li a2, 13
li a7, 64
ecallA system call is an abstract interface provided by the operating systems between code and hardware devices. In other words, we delegated the printing task to functionality already implemented by the operating system.
The system call behaves much like ordinary function calls. In fact, the write system calls has the following function signature.
extern ssize_t write (int fd, const void *buf, size_t n);In our assembly code, we passed the arguments through the a0, a1 and a2 registers, which assembly language uses to store values. In this case, a0 contains the value 1, which specifies stdout. It thus prints the string in hello with the length 13 to our terminal.
Finally, the ecall instruction invokes the system call, and the a7 register determines which system call will be invoked. We stored the value 64 to specify the write system call. It transfers control to the operating system, and the control is returned to our code once the system call is done.
Similarly, the following code invokes the exit system call.
# invoke exit(0) system call
li a0, 1
li a7, 93
ecallEntry Point
The label _start: in the following lines has special meaning.
.text
.global _start
_start:It represents the entry point where the program begins.
When compiling assembly code, we must tell the linker where the entry point is. This is exactly what the .global directive does: it makes the _start label globally visible.
Label
The _start is called label, and each label represents an address in the program. We can confirm this by disassembling the compiled binary.
$ riscv64-linux-gnu-objdump --disassemble hello.outIn the middle of the output, we can see the following.
Disassembly of section .text:
00000000000100e8 <_start>:As shown here, the _start label corresponeds to a concrete address, which was determined during the linking process.
Sections
Now that we’ve disassembled the compiled binary, let’s take a closer look to understand the entire binary.
In the .text section, we can see instructions in the binary.
00000000000100e8 <_start>:
100e8: 00100513 li a0,1
100ec: 00001597 auipc a1,0x1
100f0: 02058593 addi a1,a1,32 # 1110c <__DATA_BEGIN__>
100f4: 00d00613 li a2,13
100f8: 04000893 li a7,64
100fc: 00000073 ecallObviously, this code corresponds to the part of the assembly code that invokes the write system call. The la instruction turned into auipc and ddi, but we'll not discuss the details yet. However, note that it refers to the address 1110c labeled as __DATA_BEGIN__.
In the .data section, we can see this label.
Disassembly of section .data:
000000000001110c <__DATA_BEGIN__>:
1110c: 6548 .insn 2, 0x6548
1110e: 6c6c .insn 2, 0x6c6c
11110: 6f57206f j 84004 <__global_pointer$+0x726f8>
11114: 6c72 .insn 2, 0x6c72
11116: 2164 .insn 2, 0x2164
11118: 000a .insn 2, 0x000aYou may notice this data came from the Hello World! string. The disassembled output may look strange since the disassembler attempted to interpret data as instructions.
.data
hello: .asciz "Hello World!\n"The .asciz directive declares a null-terminated string.