This is the best outline of the Go generics issue I've seen. As an outsider to the language (for now), thanks for writing this.
What's the current conversation on sharing binary code? Unless I'm missing something, this is fundamentally incompatible with generic functions that are optimized at compile time. Either the compiled version has to have implementations for "every valid type" or the shared code must be recompiled at the time it is consumed, right?
It looks like the Go ecosystem tends to discourage distributing precompiled binary packages, so maybe this is not a real issue.
Going meta regarding tech articles, I find that it is easy to read some text-heavy articles (this being one of them) while my eyes glaze over other text-heavy articles (I couldn't easily follow Dave Cheney's article on the same topic of Go generics a few days back, though with a slightly different focus and much shorter). I've wondered what is it about specific pages/articles/writing that make them flow nicely and make it easy to read -- is it the layout of the page (font/font-size/other styling) or is it how the writing flows together or is it a factor of how much thought the author put into presenting what they are thinking or is it me (time of day, interests, etc). Maybe it is just everything!
While the typesetting helps here, I think a lot is due to the writing style. Writing easy-to-read content is difficult, and it’s usually not that obvious when you’re reading it, because you’re absorbing the concepts rather than struggling with the language.
If I remember correctly, Go was only allows source packages and does whole-program optimizations. They designed their compiler to be fast to enable this.
The author promotes code generation as a practical solution to some problems — but doesn't mention the huge drawbacks for generics, which is why it's generally not used.
The reason is that while you can generate a specific implementation when you need it, all the client code will still be using non-generic code. You can't write a function that takes a Set(T) as argument, returns Set(T) as a return value, or receives a Set(T) from a called function — unless you make that part of your code generation pipeline. So the code generation becomes viral, infecting each piece of code that wants to be generic. The entire language needs to be transpiled.
You could lower the barrier considerably by supporting some kind of pluggable preprocessing ("//go:preprocess my_generics_syntax_plugin"), but you'd still run into the problem with combining packages from multiple sources that all wanted to share the same generic types. They would all have to be transpiled the same way.
Here it depends on how you want to use generics. For map<T1, T2> this isn't much of an issue, since whenever you use it you'll probably be using a concrete instantiation.
Your point still stnds though, which is why proper generics aren't something you 'tack on' to a language. You have to be prepared for it to be used pervasively. This is probably why templates are deep in the C++ StdLib; as an attempt to ensure templates really are 'part of the language'.
Sadly, in this case it really expanded the language.
Having a lot of code go through generics isn't necessarily an issue. The potential issue comes from having a lot of different concrete instantiations of your abstract code. That issue might be solveable by smarter compilers.
Within go this is not an issue, because go dependencies are source based. Outside of go, assuming compiling a c compat lib, is more tricky, but I don't immediately see why go generics can't reduce to the intuitive pattern you'd get from selective/sparse template instantiation in your dependency before it's included.
They're source-based, but that still requires that the caller (calling into a package that's using generated code for generics) use the exact same transpiration tech to generate its own code; if it doesn't, it can only interface with the concrete instances of the generic types (e.g. Set_int or whatever).
Generics spreading throughout your app is not a problem, it's that a specialized code-generation solution would work this way. Both apps and libraries would need the same level of support, and at the end of the day you've invented a new language that happens to transpile to Go, on which you'll want to build everything. (In fact, such a language has already been created [1].) It's the awkwardness of the split between ordinary code and transpilation that causes the issues, and the resulting interop issues and fragmentation/competition between flavours.
I spent a bit of time thinking about this when I was younger, and got as far as thinking about the compiler or runtime relying on some sort of code reuse to improve the space complexity of generics. Each use would have to have its own metadata but should be able to share many function bodies.
It won’t work for primitives but I find myself using arrays of primitives less and less.
Swift does this. A generic function compiled to a single actual function, which accepts metadata about the types passed in which allow it to do the things it needs. The optimizer may, of it decides it’s worthwhile, generate specialized versions of the function, but it’s never necessary.
I'm guessing the final golang solution will be like this, but with both runtime/boxed and static/typeswitch implementations generated by default, and some heuristic specializations between composing them.
She mentions the C approach, C++ approach and Java approach. But I think she should have added the C# approach: taking all the advantages of the Java approach, without the slowness of casting.
Casting isn't slow, it's the boxing. A generics List<T> isn't much slower than the non-generic List if T is some class. But if T is say an integer, then you're paying for the implicit boxing and unboxing.
A generics List<T> isn't much slower than the non-generic List if T is some class.
This is only true generics using erasure, such as in Java. Since Java will store Objects (pointers to objects) in both cases. But in languages that use e.g. monomorphization a List<T> stores the Ts adjacent in memory, since the size of each element is known, whereas a non-generic list has to store pointers to objects, since the object sizes are not known and may vary per element. A generic List<T> that is monomorphized has two large benefits: fewer cache misses due to locality and more opportunities to inline T method implementations.
You can only do that if you can prove that T will never have subclasses that are larger.
To check that, the compiler either has to see all code that will run in the process, which means whole program compilation and no plugins, or the language needs a way to make classes that cannot be subclassed, preventing code that the compiler didn’t see from subclassing T (https://en.wikipedia.org/wiki/Class_(computer_programming)#N...)
You can only do that if you can prove that T will never have subclasses that are larger.
Only if generic type parameters are covariant, if type parameters are invariant, this is not a problem. I guess that most languages with subtypes use invariance by default.
Also, many languages that do generics through monomorphization do not support subclassing (Rust, Haskell, etc.).
no plugins,
Well, there is always the option of boxing collection elements. (E.g. by using Box in Rust.)
In general, Haskell does not do parametric polymorphism through monomorphization. In particular, higher-rank polymorphism becomes unusable if polymorphism is implemented through monomorphization. On the other hand, if I recall correctly, Rust only supports higher-rank polymorphism for lifetimes, which neither have nor need a runtime representation. This is why monomorphization is a viable implementation strategy for Rust generics.
(Aside: Type checkers essentially see recursive function definitions as applications of a fixed point operator to non-recursive functions. If your function has a rank-1 type but uses polymorphic recursion, the type checker sees it as the application of a rank-2 fixed point operator to a non-recursive function. This is why I see polymorphic recursion as “morally higher-rank polymorphism”, even when the type signatures in your code are ostensibly rank-1 ones. Polymorphic recursion is widely used in Haskell.)
However, IMO, you only need rank-1 polymorphism 95% of the time anyway, so optimizing for the common use case is a good strategy. By far, the main use case for generics is implementing efficient and reasonably reusable data structures and algorithms in a reasonably type-safe way. For this use case, monomorphization and aggressive inlining of small functions are evidently the right things to do. Other uses of generics (say, streaming I/O frameworks) strike me as a lot more questionable.
Pointing to inlining and saying "problem solved" isn't really a solution because inlining can lead to code bloat just like templates can. They amount to generating the same thing at runtime.
For this to be an implementation detail, the compiler needs to be able to choose not to inline sometimes.
What's the current conversation on sharing binary code? Unless I'm missing something, this is fundamentally incompatible with generic functions that are optimized at compile time. Either the compiled version has to have implementations for "every valid type" or the shared code must be recompiled at the time it is consumed, right?
It looks like the Go ecosystem tends to discourage distributing precompiled binary packages, so maybe this is not a real issue.