A small Rust binary indeed (2022)

Klasiaster · on Feb 18, 2024

One can also create small binaries with https://github.com/sunfishcode/origin (e.g., https://github.com/sunfishcode/origin/blob/main/example-crat... is in that ~400 bytes range) and select features as wanted without having to reimplement everything. Also see https://github.com/sunfishcode/origin-studio and https://github.com/sunfishcode/mustang - and of course https://github.com/sunfishcode/eyra

blovescoffee · on Feb 18, 2024

Halfway through the article and we have

unsafe { asm!( "mov edi, 42", "mov eax, 60", "syscall", options(nostack, noreturn) ) // nostack prevents `asm!` from push/pop rax // noreturn prevents it putting a 'ret' at the end // but it does put a ud2 (undefined instruction) instead }

and

> We will need to tell the C compiler that we’re providing our own entry point, telling it not to include it’s own start files.

So it's a Rust program but it's just calling inline assembly and using a C compiler?

remexre · on Feb 18, 2024

Rust uses the C compiler as a linker, because this is often the only way to ensure all the libraries needed by the system toolchain are included. (Compare to the CCLD variable in autotools -- it refers to the command to use the C compiler as a linker, and exists for this very reason.)

This isn't only libc -- it also includes libgcc (or compiler-rt, depending on your system toolchain), which, despite the name, may still be called "behind your back" by the LLVM toolchain.

> So it's a Rust program but it's just calling inline assembly and using a C compiler?

Yeah, I think this article is more in the tradition of [0] (but trying hard not to drop rustc) than being completely practical advice on making the binary you ship to users smaller.

[0]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...

andrewaylett · on Feb 18, 2024

Well, not quite -- https://github.com/grahamking/demeter-deploy/blob/master/see... is the Rust version of a program that used to be written entirely in assembly, and it seems that it ends up being the same size. There's a few bits of asm in amongst the Rust, but it's still definitely a Rust program.

800 lines of ASM file reduced to 600 lines of Rust, including comments and constants in both cases. He might be pushing the limits, and everything's unsafe Rust, but unsafe Rust is still safer than raw assembly.

vacuity · on Feb 18, 2024

> unsafe Rust is still safer than raw assembly

I don't think I would go that far. Assembly doesn't have undefined behavior, and especially not with the strict constraints around references as in Rust. The safe/unsafe dichotomy in Rust is better than only using C or C++ when there are concise, robust encapsulations around broken invariants.

vardump · on Feb 18, 2024

> I don't think I would go that far. Assembly doesn't have undefined behavior

As someone who has written a fair amount of assembler over the years... Yes, it doesn't have undefined behavior, but it also lacks practically all guard rails and safeties.

The smallest error and you might do things like completely messing up your call stack – just need to forget one "POP" or mess up with stack pointer adjustment. Or for example a computed jump in the middle of an instruction.

You can create bugs that can be almost impossible to figure out from a crash dump that even something as low level as C will effectively protect you from doing.

vacuity · on Feb 18, 2024

I wonder if those issues can't be somewhat mitigated with a linter or interactive emulator. In any case, I think assembly is more uniformly difficult (and not portable!), while unsafe Rust generally feels less painless but you might have no idea which invariants you need to enforce unless you're very knowledgeable. Definitely don't write a whole application in either!

Brian_K_White · on Feb 19, 2024

You could also say it doesn't have defined behavior.

lmm · on Feb 18, 2024

> Assembly doesn't have undefined behavior

Certainly some assembly languages do.

vacuity · on Feb 18, 2024

Which ones? I assume at least the 1:1 machine code kind doesn't, and you mean something more like bytecode, but it'd be interesting if I'm wrong on that count.

lmm · on Feb 18, 2024

> I assume at least the 1:1 machine code kind doesn't

They do, because the machine code sometimes has undefined behaviour. E.g. on the 6502 famously 1/4 of the instructions are undefined (all of the 0b.....11 ones), and many of them behave differently on different implementations of the processor (up to and including halting it, or placing it in a strange state).

flohofwoe · on Feb 19, 2024

Most of those undocumented instructions are what in C would be called "implementation defined behaviour" and have properly defined results, they might just differ between specific CPU models. There's only a very small number of unstable instructions which have unpredictable results (caused by "cross talk" due to incomplete instruction decoding).

lmm · on Feb 19, 2024

Halting the CPU would also generally have to be modelled as undefined behaviour in the standard.

flohofwoe · on Feb 19, 2024

The 6502 has one (or some?) "illegal" instructions where the effects are unpredictable. Only case I can think of though.

estebank · on Feb 18, 2024

Note that that step sheds libc entirely (so the binary needs to provide the minimal things that libc does for your platform, namely that assembly you mention, and you'd have to do the same for a C binary that did that) and gets rid of 3kb (16kb -> 13kb), but changing the linker flags to avoid page-aligning the binary brings it down to 400 bytes. I would have loved if the author had tried that on the libc version too, just for comparison's sake.

In a lot of conversations around Rust binary sizes some people extrapolate from the "Hello, World!" size difference as if the additional cost on top of a bare C binary was linear, when in reality it is (approximately) a constant cost. That on top of completely disregarding that the "bloat" is doing something (panic machinery, string formatting, DWARF symbol storage, DWARF symbol parsing, etc.).

joseluis · on Feb 19, 2024

I found that the stripping of the libc made impossible for me to manage signals that didn't exit the program. E.g. Sigint worked fine as long as the callback didnt return to the caller but e.g. trying to use Sigwinch or sigcontinue segfaulted and I never found a way to make that work from scratch in this type of binaries without linking to libc... I wonder if that's even possible.

userbinator · on Feb 19, 2024

That on top of completely disregarding that the "bloat" is doing something (panic machinery, string formatting, DWARF symbol storage, DWARF symbol parsing, etc.).

The point is that it'll never be doing something, and the compiler can clearly see that, but decides to add that dead code anyway.

estebank · on Feb 19, 2024

Printing to stdout is a fallible operation.

userbinator · on Feb 19, 2024

What do you do if it fails, print a message to stdout saying that it failed? ...and what if that fails?

estebank · on Feb 19, 2024

println can panic, which by default tries to print to stderr, which can succeed independently of the state of stdout, and the global panic handler can be overriden to write to a log file, pipe an event to a port or just abort. But specialising the panic handler to instead only abort for the specific case where the only reason for a panic would be println is gold plating of the highest order. Adding complexity to the compiler and language to code golf the hello world binary size because 260kb is too much bloat is time I'd rather spend doing things that help the 99.99999% of other cases.

tremon · on Feb 18, 2024

It's definitely not a constant cost, presumably due to the link-time optimization that rustc does. I've had binaries go from 800kB to 6MB simply by switching from getopts to the clap crate, for example.

pornel · on Feb 18, 2024

Binary using clap with all the bells and whistles, even without LTO, is 900KB after strip.

The standard library has 4MB of debug info baked in, which due to its special integration with Cargo is always added, even when you explicitly configure `debug=false`. This is what usually surprises people and makes Rust executables seem huge.

LtWorf · on Feb 18, 2024

So it doesn't strip unneeded stuff?

abathologist · on Feb 18, 2024

I am not much concerned with hyper optimizations, but I was curious how OCaml would fair with the initial, simple steps, before things get crazy. But I opted for a more complex program:

    (\* t.ml \*)
    let () = print_endline "Hello, World!"

Then just doing a standard compilation and a strip:

    $ ocamlopt -o t t.ml && ls -l -h t | cut -d " " -f5
    1.5M
    $ strip t && ls -l -h t | cut -d " " -f5
    356K
    $ ./t
    Hello, World!

I may be overlooking something, and would be interested to learn what if so, but I was surprised we got a result smaller than the rust binary in the first instance.

estebank · on Feb 18, 2024

It is interesting that before stripping the size of the Rust version is bigger, but after only stripping the size of the OCaml version is bigger. It'd be nice to try and see what the "extra" info that Rust ships by default is.

Voklen · on Feb 18, 2024

I believe the extra size before stripping is because Rust ships the standard library with debug symbols included by default (for now)

Source: https://kobzol.github.io/rust/cargo/2024/01/23/making-rust-b... HN discussion: https://news.ycombinator.com/item?id=39112486

abathologist · on Feb 18, 2024

Yeah, I thought the same!

davidhyde · on Feb 19, 2024

The author got 400 bytes, not 400KB though.

infogulch · on Feb 19, 2024

If you're building rust on windows, try using the msvc toolchain. I switched scryer-prolog from `x86_64-pc-windows-gnu` to `x86_64-pc-windows-msvc` and got a binary size reduction of 10x, from 100MB to 10MB.

emi2k01 · on Feb 19, 2024

Could it be that the gnu toolchain embeds the debug symbols in the binary and the msvc toolchain puts them on a separate pdb file?

r0rshrk · on Feb 18, 2024

So, the way to make your Rust binary small is to make error handling more difficult, or to rewrite it in assembly?

remram · on Feb 19, 2024

I am also disappointed that the standard library was shunned, and even the entrypoint. What is the smallest Rust binary you can produce while still writing Rust, is my question?

edit: It doesn't look like any of the techniques listed after that work if you still use the standard library.

userbinator · on Feb 19, 2024

I think the most important takeaway from this article is that the defaults are crazy inefficient; they seem like pessimisation. I've seen this a lot with "modern" toolchains and languages --- there's zero thought spent on efficiency, and an attitude of not caring at all about how much time or space something should take.

We went from 3.6 MiB to 400 bytes.

In an ideal world, you'd get those 400 bytes from the compiler when you set it to optimise for minimum size and give it the same "simplest possible Rust program", and without optimisation the output might be a little bit larger, but not 4 orders of magnitude larger.

Frankly, 3.6MB is nothing but insane for a program that does little more than exit. That's more than 2 floppies! We can add "Rust program that does nothing" to "List of Things That Turbo Pascal is Smaller Than" (https://news.ycombinator.com/item?id=22843140)

I wonder what it is that leads to such inefficiency. It would seem reasonable that a program that does very little, should also not contain much code. Therefore a compiler shouldn't generated much code. Yet somewhere along the way, drenched in multiple layers of abstraction, we've lost common sense?

A program that returns 42 is a 5-byte file in DOS: b8 2a 4c cd 21. (And if they'd been a little more thoughtful on the initial API, it could've been 3 bytes.)

flohofwoe · on Feb 19, 2024

In general I agree, but there may be cases where some bloat is justified (like getting readable callstacks on a crash, which may require at least some amount of debug information). Having said that, 3.5 Mbytes for a "do-nothing" exe is quite crazy though and should have been addressed long ago. I would expect somewhere between 4 and (to be generous) 64 KBytes.

ben0x539 · on Feb 19, 2024

> I wonder what it is that leads to such inefficiency.

My understanding is that there is little corporate funding available to resource compiler engineers to prioritize "do nothing and exit" workloads.

userbinator · on Feb 19, 2024

IMHO it's nothing to do with "funding" (look at what comes out of the demoscene, for example); or perhaps you're espousing the same mentality that causes this. There isn't a need to "prioritize" anything, because these are really problems caused by doing unnecessary work in the first place: generating code for which the results of its execution will never be needed.

ben0x539 · on Feb 19, 2024

That's a weird rhethorical flourish. If there isn't a need to prioritize work to fix the thing we're complaining about, I guess we can stop complaining about it.

userbinator · on Feb 19, 2024

That's what I'm saying: There would've been nothing to fix if they hadn't not cared at all about efficiency from the beginning, because such inefficiency would've never been created.

rcxdude · on Feb 20, 2024

If your ideal world is one without debug symbols, backtracing, and crash handling, then sure. It's not inefficiency to provide those features by default. If you want to turn them off, you can.

cryo · on Feb 18, 2024

Interesting, would be cool to see that applied to a real world rust program.

Today I got rid of libc on the Windows version of a commandline tool to flash firmware via USB, which freed 7 kB of the .exe size.

The original version was done in C++ plus Qt and was ca. 3.5 MB (.exe and dependencies).

The optimized C version is 14 kB compressed with upx.

FYI Code: https://github.com/dresden-elektronik/gcfflasher

smallnix · on Feb 19, 2024

Is upx still ringing all the AV's alarm bells? (It's been a while since I used it)

nneonneo · on Feb 19, 2024

If you want to go even smaller, get strip from binutils >= 2.41 and use `--strip-section-headers`. Using this option gets the basic exit down from 352 bytes to 140 bytes, and a hexdump shows that it is literally just the ELF header, program header(s), and text/data concatenated together - about as minimal as it gets without playing dirty header/code overlap tricks.

devit · on Feb 19, 2024

Note that the assembly code is suboptimal, since it wastes 4 bytes for each immediate.

The optimal assembly code is this: "mov al, 42; xchg edi, eax; mov al, 60; syscall". Or just "mov al, 60; syscall" if you want to exit with 0 rather 42.

userbinator · on Feb 19, 2024

My instinct says "push 42 ; pop edi ; mov al, 60 ; syscall" instead, which makes use of the "push imm8 ; pop reg" trick to likewise avoid imm32s.

devit · on Feb 19, 2024

That's the same length. There is also "mov dil, 42".

Havoc · on Feb 19, 2024

Can one selectively add back parts of std library?

If not that feels a bit like baby with bath water

estebank · on Feb 19, 2024

Rust's stdlib is separated in core, alloc and std, with each requiring more from the platform.