149 Byte Hello World, and How You Can Make Your Rust Project Smaller

I've been playing with some simple size coding in Rust over the last several weeks, getting as far as a mere 149 Byte Hello World! and 1005 Byte brainfuck interpreter. 8 minutes to read

I’ve been playing with some simple size coding in Rust over the last several weeks, getting as far as a mere 149 Byte Hello World!. This post will first focus on compiling smaller binaries for any Rust project, and then focus on the size coding techniques I used to get the 149 Byte Hello World!1. Using this, I was also able to make a 1005 byte brainfuck interpreter.

To get started, I will create a new empty project, and compare the binary size.

cargo init --vcs=none shl

In the main.rs we are simply going to print out Hello World! and quit.

fn main() {
  println!("Hello, World!");
}
cargo b
/bin/ls -l target/debug/shl

Building the project in debug mode yields a 4599104 byte (or 4.4MB ) binary, which is inexcusably big, and even worse is building in release mode didn’t change the size.

5 Quick Size Reductions

These are 5 very simple flags you can paste into your Cargo.toml to instantly decrease the size of your program and maybe even boost the performance. I’m going to assume you require the standard library, need a quick fix and do not want to mess around with size coding. For further information about build profiles as well as these flags, check out the official Cargo Book.

[profile.release]

This simply identifies that the following flags are only for the release profile, activated with --release.

# Optimize for size
opt-level = "z"
# OR Optimize for performance
opt-level = "3"

This setting sets the optimization level to optimize for size, there are a total of 6 levels. Level 3 is default in release mode, so there’s no point explicitly defining it.

# Optimizations across all crates within the dependency graph.
lto = "fat"
# Optimizations only across your crate (I think), takes less time to run
lto = "thin"

Link Time Optimization is still black magic for me, all you need to know is it works and makes your program faster. Here’s the Cargo Reference page and Wikipedia page if you want to know more.

strip = true

Removes Debug symbols, including variable, function, and class names, source code file names and line references. Always useful to turn on, unless your code is crashing in production.

codegen-units = 1

Do not split your crate into multiple smaller parts, increases the effectivity of other options.

panic = "abort"

Instead of unwinding the stack and providing a stack trace on panic, just simply exit. Only useful if you know crashes are rare and don’t need a stack trace.

After using all of these flags, our Cargo.toml file looks like this.

[package]
name = "shl"
version = "0.1.0"
edition = "2021"

[profile.release]
opt-level = "z"  # Optimize for size.
lto = true # Use "fat" LTO
strip = true # Remove debug symbols
codegen-units = 1 # Compile in one big chunk
panic = "abort" # Do not provide a stack trace on panic

Compiling with release mode yields a 325944 byte (or 318 KB) large file. Better, but there’s still more to go.

The STD

The Rust Standard Library (or std for short) is very useful if you don’t want to reinvent the wheel, and is plenty fast, but at the cost of size. For those who don’t know, the entire standard library is statically linked when compiling, even if you don’t need the entirety of it. When this becomes a problem, you have two options:

Build it

The reasonable thing you might be want to try is to rebuild the standard library, although you will need the nightly toolchain (which might also make your files smaller and code faster) and lots of patience.

cargo build –release -Z build-std –target x86_64-unknown-linux-gnu

Ditching it

Simply adding

![no_std]

to the top of your main.rs will completely remove the standard library. You still have access to functions like printf from libc, and there are many crates that will work without std. Without the standard library, you will need to define your own panic handler.

For this example, I will also ditch the main function, and define my own.

#![no_main]
#![no_std]

#![feature(rustc_private)]
extern crate libc;

#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
    const HELLO: &'static str = "Hello, World!\n\0";

    unsafe {
        libc::printf(HELLO.as_ptr() as *const _);
    }

    0
}

#[panic_handler]
fn my_panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

Building with the nightly toolchain yields a 14224 byte (or 14 KB) large file. Not bad, but not great either.

Cutting Down to the Bone (and beyond)

We have reached the furthest point where every day projects can go, but if you’re a nerd like me, there’s plenty of fat we can shave off.

At this stage, the executable has 25 sections, even though you only need a few. The easiest way is to define some rust compiler flags.

export RUSTFLAGS="-Ctarget-cpu=native -Clink-args=-nostartfiles -Crelocation-model=static
-Clink-args=-Wl,-n,-N,--no-dynamic-linker,--no-pie,--build-id=none,--no-eh-frame-hdr"

We tell rustc to target the native CPU (aka enable stuff like SSE3), do not link libc, only statically link, and pass linker flags that tell ld to not page align sections, not to use position-independent code, and remove any build ID.

Since we have removed libc, we need to use inline assembly to write to the standard output.

#![no_std]
#![no_main]

const MSG: &'static str = "Hello, World!\n";

use core::arch::asm;

#[no_mangle]
pub extern "C" fn _start(_argc: isize, _argv: *const *const u8) {
    write_to_std_out(MSG.as_ptr(), MSG.len());

    exit(0);
}

fn write_to_std_out(string_pointer: *const u8, string_length: usize) {
    unsafe {
        asm!(
            "syscall",
            in("rax") 1, // write syscall number
            in("rdi") 1, // stdout file descriptor, 2 is stderr
            in("rsi") string_pointer,
            in("rdx") string_length,
            out("rcx") _, // clobbered by syscalls
            out("r11") _, // clobbered by syscalls
            lateout("rax") _, // clobbered by syscalls, 
                              // if you can't print more than once, you are missing this
        );
    }
}

fn exit(code: i32) {
    unsafe {
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") code,
            options(noreturn)
        );
    }
}

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

With this, we get an even smaller file with only 624 bytes and 7 sections, which is still 7 too much.

We can remove the comment section with a simple command.

objcopy -R .comment target/release/shl target/release/shl

Now we are even closer, just 496 bytes. Looking at the file in a hex editor such as ImHex, you may notice quite a lot of padding after the actual code. /images/imhex-1.jpeg

We can remove this by using sstrip from the ELFkickers collection.

sstrip target/release/shl

/images/imhex-2.jpeg

That’s better, we are now at 215 bytes. The only thing left to do now is to transplant the ELF header. Not only can we remove one program header, but also store data in some unused fields. I wrote a simple Python script that does exactly that, and is available in the smallest-hello-rs github.

import struct

with open('shl', 'rb+') as file:
    data = bytearray(file.read())

    # Calculate the entry point offset
    file.seek(0x18)
    entry_point_offset: int = struct.unpack('B', file.read1(1))[0]
    print(
        f"Entry point is at byte {entry_point_offset} ({hex(entry_point_offset)})")

    file.seek(entry_point_offset)
    bytes_after_entry_point: bytes = file.read1(len(data))

    # Clear the existing ELF header
    file.seek(0)
    file.truncate()

    data = bytes([
        0x7F, 0x45, 0x4C, 0x46, # 4b, Header
        0x02, # 1b, class, 64bit
        0x01, # 1b, endianness, LE
        0x01, # 1b, ELF Version
        0x00, # 1b, ABI type, normally 0 (SystemV), probably fine to overwrite with data
        0x00, # 1b, ABI version, normally 0, probably fine to overwrite with data
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, # 7b, e_ident
        0x02, 0x00, # 2b, e_type, executable
        0x3E, 0x00, # 2b, e_machine, AMD x86-64
        0x01, 0x00, 0x00, 0x00, # 4b, e_version
        
        # 8b, e_entry, entry point
        entry_point_offset - 32 - 16 - 16, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 
        
        # 8b, e_phoff, start of program header
        0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        
        # 8b, e_shoff, start of section header table
        0x00, 0x00, 0x00, 0x48, 0x65, 0x6C, 0x6C, 0x6F,

        # 4b, e_flags
        0x00, 0x00, 0x00, 0x00,

        # 2b, e_ehsize, Contains the size of this header, normally 64 Bytes for 64-bit and 52 Bytes for 32-bit format
        0x40, 0x00,
        
        # 2b, e_phentsize, size of program header, 54b
        0x38, 0x00,

        # 2b, e_phnum, number of entries in program header table
        0x01, 0x00,

        # 2b, e_shentsize, size of section header table entry
        0x40, 0x00,

        # 2b, e_shnum, number of section header entries
        0x00, 0x00,

        # 2b, e_shstrndx
        0x00, 0x00,
        

        ### /// PROGRAM HEADERS ///

        # 4b, p_type, 1 for loadable
        0x01, 0x00, 0x00, 0x00,

        # 4b, p_flags
        # 0x07 = +RWX
        # The rest is data
        0x07, 0x00, 0x00, 0x00,
        
        # 8b, p_offset, offset of the segment in the file image
        0xB0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

        # 8b, p_vaddr, virtual addr of segment in memory start of this segment
        0xB0, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,

        # 8b, p_paddr, same as vaddr except on physical systems
        #0xB0, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
        # Seems it's fine replacing it with data?
        0x2C, 0x20, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21,

        # 8b, p_filesz
        # The first byte was 0x25 but I overwrote it into a newline (0A)
        0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        
        # 8b, p_memsz
        0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

        # 8b, p_align
        # it doesn't segfault when I comment it out so f it
        #0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    ])

    file.write(data)
    file.write(bytes_after_entry_point)

/images/imhex-3.jpeg

Looking at it now, we have made a 149 byte binary from scratch, and not only that, we have made it programmatically so it can be repeated at any time, without manual modifications.

If you want to play around with the binary in a hex editor, it’s on the github under WTFPL.

Ideas

Further Reading and Resources

My teeny little project is standing on the shoulders of giants, here are the resources I used while making it.


  1. in fact, I think it might actually be the smallest one in Rust, the only other one I could find was 151 bytes↩︎