149 Byte Hello World, and How You Can Make Your Rust Project Smaller
A quick note about this post. This is my first post ever written, and while I am not exactly ashamed for it, I am also not proud of it. A rewrite has been in limbo for a couple months now, however I am too lazy to delete the thing and start over. So I’ve only edited it a bit so it’s more readable.
I’ve been size coding some Rust, trying to make the smallest possible “Hello World!” program.
After a few days of tinkering, I’ve managed to squeeze it down and make a mere 149
Byte Hello World!, (probably) the smallest 64bit one out there in Rust.
I got bored of just printing so I also made a 722 byte brainfuck interpreter.
There’s two main parts to this post, the one you can use to make smaller binaries and the batshit byte mucking around and finding out that I used to make it so small.
To get started, I will create a new empty project, and compare the binary size.
cargo init --vcs=none shl
In the main.rs
we are simply going to print out Hello World! and quit.
|
|
cargo b
cargo b --release
ls -lh target/*/shl
Building the project in debug mode yields 3.4MB binary, which is inexcusably big, thankfuly the release mode reduced the size to 390KB. But still, 390 000 bytes to print a 13 byte string?
The only other file automatically generated for us is the Cargo.toml manifest
For more information about these flags, check out the official Cargo Book.
|
|
This setting sets the optimization level to optimize for size.
Valid options range from 0
to 3
, s
and z
.
Level 3
is default in release mode, so there’s no point explicitly defining it.
|
|
The before mentioned optimization level work on individual translation units (files) from the compiler, those are then linked (merged) into one monolith that is then turned into instructions through codegen. Link Time Optimization works on all of the code, at once, and optimizes stuff the compiler would not be able to find.
“Fat” LTO refers to LLVM’s old way of doing LTO, while “thin” LTO is newer, quicker to build, more parallel and sometimes even faster? I tried both but “Fat” LTO was smaller.
|
|
Removes Debug symbols, including variable, function and class names, source code file names and line references. Useful to turn on, unless your code is crashing in production.
|
|
Do not split compilation into multiple smaller parts, increases the effectivity of other options at the cost of compilation speed.
|
|
Instead of unwinding the stack and providing a stack trace on panic, just simply exit.
Thus, here is the entire Cargo.toml
for your projects
|
|
With these, the size reduces furter to… 287 KB, down 3KB. You would see a bigger reduction if the application did something, and it would also be faster.
NOTE
If you’re trying to make something really high performance, check out Profile-guided Optimization.NOTE
If you want LTO as an optional compilation feature (e.g. takes too long on CI/CD servers), you can make profiles that inherit other profiles. Composition over Inheritance anyone?The STDs and STDon’ts
The Rust Standard Library (or std
for short) is very useful if you don’t want to reinvent the wheel, and is plenty fast, but at the cost of size.
For those who don’t know, the entire standard library is statically linked when compiling, even if you only need a single println. When this becomes a problem, you have two options:
Build it
The reasonable thing you might be want to try is to rebuild the standard library, although you will need the nightly toolchain (which might also make your files smaller and code faster) and lots of patience.
cargo +nightly build --release -Z build-std=panic_abort,std -Z build-std-features="optimize_for_size" --target x86_64-unknown-linux-gnu
ll target/x86_64-unknown-linux-gnu/release/shl
This works out to a nice 43KB. That is good enough you might be able to make a static web server for a RP2354.
Ditching it
Simply adding
|
|
main.rs
will completely remove the standard library. You still have access to functions like printf
from libc
, and there are many crates that will work without std.
Without the standard library, you will also need to define your own panic handler.
As there is now no println
macro to use, I will import libc and use printf.
|
|
Building with the nightly toolchain yields a 14224
byte (or 14 KB
) large file. Not bad, but not great either, as you rely on libc.
Cutting Down to the Bone (and beyond)
We have reached the furthest point where every day projects can go, but there’s still plenty of fat we can shave off.
At this stage, the ELF file has 25 sections, even though you only need a few. The easiest way is to define some rust compiler flags.
export RUSTFLAGS="-Ctarget-cpu=native -Clink-args=-nostartfiles -Crelocation-model=static
-Clink-args=-Wl,-n,-N,--no-dynamic-linker,--no-pie,--build-id=none,--no-eh-frame-hdr"
We tell rustc
to target the native CPU (aka enable stuff like SSE3), do not link libc, only statically link, and pass linker flags that tell ld
to not page align sections, not to use position-independent code, and remove any build ID.
Since we have removed libc
, we need to use inline assembly and Linux syscalls to write to the standard output.
#![no_std]
#![no_main]
const MSG: &'static str = "Hello, World!\n";
use core::arch::asm;
#[no_mangle]
pub extern "C" fn _start(_argc: isize, _argv: *const *const u8) {
write_to_std_out(MSG.as_ptr(), MSG.len());
exit(0);
}
fn write_to_std_out(string_pointer: *const u8, string_length: usize) {
unsafe {
asm!(
"syscall",
in("rax") 1, // write syscall number
in("rdi") 1, // stdout file descriptor, 2 is stderr
in("rsi") string_pointer,
in("rdx") string_length,
out("rcx") _, // clobbered by syscalls
out("r11") _, // clobbered by syscalls
lateout("rax") _, // clobbered by syscalls,
// if you can't print more than once, you are missing this
);
}
}
fn exit(code: i32) {
unsafe {
asm!(
"syscall",
in("rax") 60,
in("rdi") code,
options(noreturn)
);
}
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}
With this, we get an even smaller file with only 624
bytes and 7 sections, which is still 7 too much.
We can remove the comment section with a simple command.
objcopy -R .comment target/release/shl target/release/shl
Now we are even closer, just 496
bytes.
Looking at the file in a hex editor such as ImHex, you may notice quite a lot of padding after the actual code.
We can remove this by using sstrip
utility from the ELF kickers collection.
sstrip target/release/shl
That’s better, we are now at 215
bytes.
The only thing left to do now is to transplant the ELF header.
Not only can we remove one program header, but also store data in some unused fields. I wrote a simple Python script that does exactly that, and is available in the repo.
Looking at it now, we have made a 149
byte binary from scratch, and not only that, all of the jankier modifications are programmatic so they can be repeated at any time, without manual modifications.
If you want to play around with the binary, it’s on the GitHub under WTFPL.
Ideas
- Put the whole message into one buffer, one less syscall
- Remove
ud2
instructions - Listen to some good music
- Make a Rust 4k chess engine?
- ???
Further Reading and Resources
My teeny little project is standing on the shoulders of giants, here are the resources I used while making it.
- https://mainisusuallyafunction.blogspot.com/2015/01/151-byte-static-linux-binary-in-rust.html
- https://dev.to/szymongib/single-syscall-hello-world-in-rust-part-2-4jj4
- https://os.phil-opp.com/freestanding-rust-binary/
- https://darkcoding.net/software/a-very-small-rust-binary-indeed/
- https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
- https://www.muppetlabs.com/~breadbox/software/tiny/revisit.html
- https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html
- http://timelessname.com/elfbin/
- https://in4k.github.io/wiki/linux
- https://jacobgw.com/blog/zig/low-level/2021/03/15/elf-linux.html
- https://arusahni.net/blog/2020/03/optimizing-rust-binary-size.html