149 Byte Hello World, and How You Can Make Your Rust Project Smaller
I've been playing with some simple size coding in Rust over the last several weeks, getting as far as a mere 149 Byte Hello World! and 1005 Byte brainfuck interpreter. 8 minutes to readI’ve been playing with some simple size coding in Rust over the last several weeks, getting as far as a mere 149
Byte Hello World!.
This post will first focus on compiling smaller binaries for any Rust project, and then focus on the size coding techniques I used to get the 149
Byte Hello World!
1.
Using this, I was also able to make a 1005 byte brainfuck interpreter.
To get started, I will create a new empty project, and compare the binary size.
cargo init --vcs=none shl
In the main.rs
we are simply going to print out Hello World! and quit.
fn main() {
println!("Hello, World!");
}
cargo b
/bin/ls -l target/debug/shl
Building the project in debug mode yields a 4599104
byte (or 4.4MB
) binary, which is inexcusably big, and even worse is building in release mode didn’t change the size.
5 Quick Size Reductions
These are 5 very simple flags you can paste into your Cargo.toml
to instantly decrease the size of your program and maybe even boost the performance. I’m going to assume you require the standard library, need a quick fix and do not want to mess around with size coding.
For further information about build profiles as well as these flags, check out the official Cargo Book.
[profile.release]
This simply identifies that the following flags are only for the release profile, activated with --release
.
# Optimize for size
opt-level = "z"
# OR Optimize for performance
opt-level = "3"
This setting sets the optimization level to optimize for size, there are a total of 6 levels.
Level 3
is default in release mode, so there’s no point explicitly defining it.
# Optimizations across all crates within the dependency graph.
lto = "fat"
# Optimizations only across your crate (I think), takes less time to run
lto = "thin"
Link Time Optimization is still black magic for me, all you need to know is it works and makes your program faster. Here’s the Cargo Reference page and Wikipedia page if you want to know more.
strip = true
Removes Debug symbols, including variable, function, and class names, source code file names and line references. Always useful to turn on, unless your code is crashing in production.
codegen-units = 1
Do not split your crate into multiple smaller parts, increases the effectivity of other options.
panic = "abort"
Instead of unwinding the stack and providing a stack trace on panic, just simply exit. Only useful if you know crashes are rare and don’t need a stack trace.
After using all of these flags, our Cargo.toml
file looks like this.
[package]
name = "shl"
version = "0.1.0"
edition = "2021"
[profile.release]
opt-level = "z" # Optimize for size.
lto = true # Use "fat" LTO
strip = true # Remove debug symbols
codegen-units = 1 # Compile in one big chunk
panic = "abort" # Do not provide a stack trace on panic
Compiling with release mode yields a 325944
byte (or 318 KB
) large file. Better, but there’s still more to go.
The STD
The Rust Standard Library (or std
for short) is very useful if you don’t want to reinvent the wheel, and is plenty fast, but at the cost of size. For those who don’t know, the entire standard library is statically linked when compiling, even if you don’t need the entirety of it. When this becomes a problem, you have two options:
Build it
The reasonable thing you might be want to try is to rebuild the standard library, although you will need the nightly toolchain (which might also make your files smaller and code faster) and lots of patience.
cargo build –release -Z build-std –target x86_64-unknown-linux-gnu
Ditching it
Simply adding
![no_std]
to the top of your main.rs
will completely remove the standard library. You still have access to functions like printf
from libc
, and there are many crates that will work without std.
Without the standard library, you will need to define your own panic handler.
For this example, I will also ditch the main function, and define my own.
#![no_main]
#![no_std]
#![feature(rustc_private)]
extern crate libc;
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
const HELLO: &'static str = "Hello, World!\n\0";
unsafe {
libc::printf(HELLO.as_ptr() as *const _);
}
0
}
#[panic_handler]
fn my_panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
Building with the nightly toolchain yields a 14224
byte (or 14 KB
) large file. Not bad, but not great either.
Cutting Down to the Bone (and beyond)
We have reached the furthest point where every day projects can go, but if you’re a nerd like me, there’s plenty of fat we can shave off.
At this stage, the executable has 25 sections, even though you only need a few. The easiest way is to define some rust compiler flags.
export RUSTFLAGS="-Ctarget-cpu=native -Clink-args=-nostartfiles -Crelocation-model=static
-Clink-args=-Wl,-n,-N,--no-dynamic-linker,--no-pie,--build-id=none,--no-eh-frame-hdr"
We tell rustc
to target the native CPU (aka enable stuff like SSE3), do not link libc, only statically link, and pass linker flags that tell ld
to not page align sections, not to use position-independent code, and remove any build ID.
Since we have removed libc
, we need to use inline assembly to write to the standard output.
#![no_std]
#![no_main]
const MSG: &'static str = "Hello, World!\n";
use core::arch::asm;
#[no_mangle]
pub extern "C" fn _start(_argc: isize, _argv: *const *const u8) {
write_to_std_out(MSG.as_ptr(), MSG.len());
exit(0);
}
fn write_to_std_out(string_pointer: *const u8, string_length: usize) {
unsafe {
asm!(
"syscall",
in("rax") 1, // write syscall number
in("rdi") 1, // stdout file descriptor, 2 is stderr
in("rsi") string_pointer,
in("rdx") string_length,
out("rcx") _, // clobbered by syscalls
out("r11") _, // clobbered by syscalls
lateout("rax") _, // clobbered by syscalls,
// if you can't print more than once, you are missing this
);
}
}
fn exit(code: i32) {
unsafe {
asm!(
"syscall",
in("rax") 60,
in("rdi") code,
options(noreturn)
);
}
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}
With this, we get an even smaller file with only 624
bytes and 7 sections, which is still 7 too much.
We can remove the comment section with a simple command.
objcopy -R .comment target/release/shl target/release/shl
Now we are even closer, just 496
bytes.
Looking at the file in a hex editor such as ImHex, you may notice quite a lot of padding after the actual code.
We can remove this by using sstrip
from the ELFkickers collection.
sstrip target/release/shl
That’s better, we are now at 215
bytes.
The only thing left to do now is to transplant the ELF header.
Not only can we remove one program header, but also store data in some unused fields. I wrote a simple Python script that does exactly that, and is available in the smallest-hello-rs github.
import struct
with open('shl', 'rb+') as file:
data = bytearray(file.read())
# Calculate the entry point offset
file.seek(0x18)
entry_point_offset: int = struct.unpack('B', file.read1(1))[0]
print(
f"Entry point is at byte {entry_point_offset} ({hex(entry_point_offset)})")
file.seek(entry_point_offset)
bytes_after_entry_point: bytes = file.read1(len(data))
# Clear the existing ELF header
file.seek(0)
file.truncate()
data = bytes([
0x7F, 0x45, 0x4C, 0x46, # 4b, Header
0x02, # 1b, class, 64bit
0x01, # 1b, endianness, LE
0x01, # 1b, ELF Version
0x00, # 1b, ABI type, normally 0 (SystemV), probably fine to overwrite with data
0x00, # 1b, ABI version, normally 0, probably fine to overwrite with data
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, # 7b, e_ident
0x02, 0x00, # 2b, e_type, executable
0x3E, 0x00, # 2b, e_machine, AMD x86-64
0x01, 0x00, 0x00, 0x00, # 4b, e_version
# 8b, e_entry, entry point
entry_point_offset - 32 - 16 - 16, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, e_phoff, start of program header
0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, e_shoff, start of section header table
0x00, 0x00, 0x00, 0x48, 0x65, 0x6C, 0x6C, 0x6F,
# 4b, e_flags
0x00, 0x00, 0x00, 0x00,
# 2b, e_ehsize, Contains the size of this header, normally 64 Bytes for 64-bit and 52 Bytes for 32-bit format
0x40, 0x00,
# 2b, e_phentsize, size of program header, 54b
0x38, 0x00,
# 2b, e_phnum, number of entries in program header table
0x01, 0x00,
# 2b, e_shentsize, size of section header table entry
0x40, 0x00,
# 2b, e_shnum, number of section header entries
0x00, 0x00,
# 2b, e_shstrndx
0x00, 0x00,
### /// PROGRAM HEADERS ///
# 4b, p_type, 1 for loadable
0x01, 0x00, 0x00, 0x00,
# 4b, p_flags
# 0x07 = +RWX
# The rest is data
0x07, 0x00, 0x00, 0x00,
# 8b, p_offset, offset of the segment in the file image
0xB0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, p_vaddr, virtual addr of segment in memory start of this segment
0xB0, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, p_paddr, same as vaddr except on physical systems
#0xB0, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
# Seems it's fine replacing it with data?
0x2C, 0x20, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21,
# 8b, p_filesz
# The first byte was 0x25 but I overwrote it into a newline (0A)
0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, p_memsz
0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
# 8b, p_align
# it doesn't segfault when I comment it out so f it
#0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
])
file.write(data)
file.write(bytes_after_entry_point)
Looking at it now, we have made a 149
byte binary from scratch, and not only that, we have made it programmatically so it can be repeated at any time, without manual modifications.
If you want to play around with the binary in a hex editor, it’s on the github under WTFPL.
Ideas
- Put the whole message into one buffer, one less syscall
- Remove
ud2
instructions - Listen to some good music
- ???
Further Reading and Resources
My teeny little project is standing on the shoulders of giants, here are the resources I used while making it.
- https://mainisusuallyafunction.blogspot.com/2015/01/151-byte-static-linux-binary-in-rust.html
- https://dev.to/szymongib/single-syscall-hello-world-in-rust-part-2-4jj4
- https://os.phil-opp.com/freestanding-rust-binary/
- https://darkcoding.net/software/a-very-small-rust-binary-indeed/
- https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
- https://www.muppetlabs.com/~breadbox/software/tiny/revisit.html
- https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html
- http://timelessname.com/elfbin/
- https://in4k.github.io/wiki/linux
- https://jacobgw.com/blog/zig/low-level/2021/03/15/elf-linux.html
- https://arusahni.net/blog/2020/03/optimizing-rust-binary-size.html