Well, the idea here is that with libraries in pure source form, a partial evaluator could be able to specialize and shrink down the program automatically, as you describe. So you'd get the best of both worlds.
I'm not aware of any systems that actually do this on a global level, but I have a feeling I may be surprised if I start looking around.
Well, link time optimization does this for C code in principle. It is fairly new in gcc and clang so it is not clear what it buys you in real world cases. A lot of code cant be compiled out because it is hard to prove it cannot be called. I plan to experiment with LTO and compiling a kernel and an application together at some point to see how it goes.
I'm not aware of any systems that actually do this on a global level, but I have a feeling I may be surprised if I start looking around.