unsafe {
asm!(
"mov edi, 42",
"mov eax, 60",
"syscall",
options(nostack, noreturn)
)
// nostack prevents `asm!` from push/pop rax
// noreturn prevents it putting a 'ret' at the end
// but it does put a ud2 (undefined instruction) instead
}
and
> We will need to tell the C compiler that we’re providing our own entry point, telling it not to include it’s own start files.
So it's a Rust program but it's just calling inline assembly and using a C compiler?
Rust uses the C compiler as a linker, because this is often the only way to ensure all the libraries needed by the system toolchain are included. (Compare to the CCLD variable in autotools -- it refers to the command to use the C compiler as a linker, and exists for this very reason.)
This isn't only libc -- it also includes libgcc (or compiler-rt, depending on your system toolchain), which, despite the name, may still be called "behind your back" by the LLVM toolchain.
> So it's a Rust program but it's just calling inline assembly and using a C compiler?
Yeah, I think this article is more in the tradition of [0] (but trying hard not to drop rustc) than being completely practical advice on making the binary you ship to users smaller.
Well, not quite -- https://github.com/grahamking/demeter-deploy/blob/master/see... is the Rust version of a program that used to be written entirely in assembly, and it seems that it ends up being the same size. There's a few bits of asm in amongst the Rust, but it's still definitely a Rust program.
800 lines of ASM file reduced to 600 lines of Rust, including comments and constants in both cases. He might be pushing the limits, and everything's unsafe Rust, but unsafe Rust is still safer than raw assembly.
I don't think I would go that far. Assembly doesn't have undefined behavior, and especially not with the strict constraints around references as in Rust. The safe/unsafe dichotomy in Rust is better than only using C or C++ when there are concise, robust encapsulations around broken invariants.
> I don't think I would go that far. Assembly doesn't have undefined behavior
As someone who has written a fair amount of assembler over the years... Yes, it doesn't have undefined behavior, but it also lacks practically all guard rails and safeties.
The smallest error and you might do things like completely messing up your call stack – just need to forget one "POP" or mess up with stack pointer adjustment. Or for example a computed jump in the middle of an instruction.
You can create bugs that can be almost impossible to figure out from a crash dump that even something as low level as C will effectively protect you from doing.
I wonder if those issues can't be somewhat mitigated with a linter or interactive emulator. In any case, I think assembly is more uniformly difficult (and not portable!), while unsafe Rust generally feels less painless but you might have no idea which invariants you need to enforce unless you're very knowledgeable. Definitely don't write a whole application in either!
Which ones? I assume at least the 1:1 machine code kind doesn't, and you mean something more like bytecode, but it'd be interesting if I'm wrong on that count.
> I assume at least the 1:1 machine code kind doesn't
They do, because the machine code sometimes has undefined behaviour. E.g. on the 6502 famously 1/4 of the instructions are undefined (all of the 0b.....11 ones), and many of them behave differently on different implementations of the processor (up to and including halting it, or placing it in a strange state).
Most of those undocumented instructions are what in C would be called "implementation defined behaviour" and have properly defined results, they might just differ between specific CPU models. There's only a very small number of unstable instructions which have unpredictable results (caused by "cross talk" due to incomplete instruction decoding).
Note that that step sheds libc entirely (so the binary needs to provide the minimal things that libc does for your platform, namely that assembly you mention, and you'd have to do the same for a C binary that did that) and gets rid of 3kb (16kb -> 13kb), but changing the linker flags to avoid page-aligning the binary brings it down to 400 bytes. I would have loved if the author had tried that on the libc version too, just for comparison's sake.
In a lot of conversations around Rust binary sizes some people extrapolate from the "Hello, World!" size difference as if the additional cost on top of a bare C binary was linear, when in reality it is (approximately) a constant cost. That on top of completely disregarding that the "bloat" is doing something (panic machinery, string formatting, DWARF symbol storage, DWARF symbol parsing, etc.).
I found that the stripping of the libc made impossible for me to manage signals that didn't exit the program. E.g. Sigint worked fine as long as the callback didnt return to the caller but e.g. trying to use Sigwinch or sigcontinue segfaulted and I never found a way to make that work from scratch in this type of binaries without linking to libc... I wonder if that's even possible.
That on top of completely disregarding that the "bloat" is doing something (panic machinery, string formatting, DWARF symbol storage, DWARF symbol parsing, etc.).
The point is that it'll never be doing something, and the compiler can clearly see that, but decides to add that dead code anyway.
println can panic, which by default tries to print to stderr, which can succeed independently of the state of stdout, and the global panic handler can be overriden to write to a log file, pipe an event to a port or just abort. But specialising the panic handler to instead only abort for the specific case where the only reason for a panic would be println is gold plating of the highest order. Adding complexity to the compiler and language to code golf the hello world binary size because 260kb is too much bloat is time I'd rather spend doing things that help the 99.99999% of other cases.
It's definitely not a constant cost, presumably due to the link-time optimization that rustc does. I've had binaries go from 800kB to 6MB simply by switching from getopts to the clap crate, for example.
Binary using clap with all the bells and whistles, even without LTO, is 900KB after strip.
The standard library has 4MB of debug info baked in, which due to its special integration with Cargo is always added, even when you explicitly configure `debug=false`. This is what usually surprises people and makes Rust executables seem huge.
I am not much concerned with hyper optimizations, but I was curious how OCaml would fair with the initial, simple steps, before things get crazy. But I opted for a more complex program:
(\* t.ml \*)
let () = print_endline "Hello, World!"
Then just doing a standard compilation and a strip:
$ ocamlopt -o t t.ml && ls -l -h t | cut -d " " -f5
1.5M
$ strip t && ls -l -h t | cut -d " " -f5
356K
$ ./t
Hello, World!
I may be overlooking something, and would be interested to learn what if so, but I was surprised we got a result smaller than the rust binary in the first instance.
It is interesting that before stripping the size of the Rust version is bigger, but after only stripping the size of the OCaml version is bigger. It'd be nice to try and see what the "extra" info that Rust ships by default is.
If you're building rust on windows, try using the msvc toolchain. I switched scryer-prolog from `x86_64-pc-windows-gnu` to `x86_64-pc-windows-msvc` and got a binary size reduction of 10x, from 100MB to 10MB.
I am also disappointed that the standard library was shunned, and even the entrypoint. What is the smallest Rust binary you can produce while still writing Rust, is my question?
edit: It doesn't look like any of the techniques listed after that work if you still use the standard library.
I think the most important takeaway from this article is that the defaults are crazy inefficient; they seem like pessimisation. I've seen this a lot with "modern" toolchains and languages --- there's zero thought spent on efficiency, and an attitude of not caring at all about how much time or space something should take.
We went from 3.6 MiB to 400 bytes.
In an ideal world, you'd get those 400 bytes from the compiler when you set it to optimise for minimum size and give it the same "simplest possible Rust program", and without optimisation the output might be a little bit larger, but not 4 orders of magnitude larger.
Frankly, 3.6MB is nothing but insane for a program that does little more than exit. That's more than 2 floppies! We can add "Rust program that does nothing" to "List of Things That Turbo Pascal is Smaller Than" (https://news.ycombinator.com/item?id=22843140)
I wonder what it is that leads to such inefficiency. It would seem reasonable that a program that does very little, should also not contain much code. Therefore a compiler shouldn't generated much code. Yet somewhere along the way, drenched in multiple layers of abstraction, we've lost common sense?
A program that returns 42 is a 5-byte file in DOS: b8 2a 4c cd 21. (And if they'd been a little more thoughtful on the initial API, it could've been 3 bytes.)
In general I agree, but there may be cases where some bloat is justified (like getting readable callstacks on a crash, which may require at least some amount of debug information). Having said that, 3.5 Mbytes for a "do-nothing" exe is quite crazy though and should have been addressed long ago. I would expect somewhere between 4 and (to be generous) 64 KBytes.
IMHO it's nothing to do with "funding" (look at what comes out of the demoscene, for example); or perhaps you're espousing the same mentality that causes this. There isn't a need to "prioritize" anything, because these are really problems caused by doing unnecessary work in the first place: generating code for which the results of its execution will never be needed.
That's a weird rhethorical flourish. If there isn't a need to prioritize work to fix the thing we're complaining about, I guess we can stop complaining about it.
That's what I'm saying: There would've been nothing to fix if they hadn't not cared at all about efficiency from the beginning, because such inefficiency would've never been created.
If your ideal world is one without debug symbols, backtracing, and crash handling, then sure. It's not inefficiency to provide those features by default. If you want to turn them off, you can.
If you want to go even smaller, get strip from binutils >= 2.41 and use `--strip-section-headers`. Using this option gets the basic exit down from 352 bytes to 140 bytes, and a hexdump shows that it is literally just the ELF header, program header(s), and text/data concatenated together - about as minimal as it gets without playing dirty header/code overlap tricks.
Note that the assembly code is suboptimal, since it wastes 4 bytes for each immediate.
The optimal assembly code is this: "mov al, 42; xchg edi, eax; mov al, 60; syscall". Or just "mov al, 60; syscall" if you want to exit with 0 rather 42.