Why is it so different? I'd be curious to know. In my experience, a stop-the-wor...

enneff · on March 19, 2012

Java requires a powerful, sophisticated garbage collector because it is extremely difficult (and in many cases impossible) to write Java programs that don't generate a lot of garbage. Even many of the core APIs are allocation heavy. It's a pain.

Go data structures tend to be much smaller than the Java equivalents, and it is much easier to track down and eliminate unnecessary allocations in Go code. When you have better control over allocations you don't need to lean so hard on the garbage collector. (Some of this is touched on here http://loadcode.blogspot.com.au/2009/12/go-vs-java.html)

You seem to have a lot of strong opinions (and predictions!) about Go, when you clearly don't have much experience with it.

ootachi · on March 19, 2012

Go gives you slightly better control over allocation than Java (in that you have a choice to allocate on the stack and inside other data structures -- but keep in mind that escape analysis can give you this too, see the optimizations in the Jikes RVM). However, in Go, you still have no choice but to allocate on the heap in many instances (for example, when returning a data structure from a function, or when using maps, etc.) You're taking a big bet that programs will be able to use the stack and manually use free lists or whatever (which you can still do in Java!), and that the slow performance of the GC won't be a problem. I wouldn't take that bet.

Besides, once you have a stop-the-world multithreaded GC, it has to trace all the roots in the program for correctness. There's no way around that. At that point having all the data on the stack doesn't help you (except for cache locality, but Java's GC already does that via the nursery and copying during major collections). You must still trace every pointer in the object graph, while keeping all threads suspended. You can run the GC less often, but that's not much of a help when interactive performance is at stake. iOS feels so great because the UI is always running at 60 frames per second. Stop-the-world GC can't do that.

dsymonds · on March 19, 2012

We don't bet, we experiment, measure and analyse. So far, I have found memory allocation in Go to be extremely predictable and controllable, and the GC behaviour likewise predictable.

I don't know why you claim that "having all the data on the stack doesn't help you". That makes no sense. If data is on the stack, any references starting there disappear as soon as that stack frame is popped, so there's no lingering work for the GC to do; the GC arena is unchanged.

Your comments strongly imply that you have no practical experience with Go's GC. I suggest you stop claiming that it has certain performance characteristics or behaviours when you have not experienced it yourself.

ootachi · on March 19, 2012

"If data is on the stack, any references starting there disappear as soon as that stack frame is popped, so there's no lingering work for the GC to do"

You're only considering the sweep phase. Sweep is the easy part of tracing GC - you can always chuck sweep into a background thread. The mark phase is the problem. Marking always has to trace roots, including stack roots - that's how GC works. Tracing GC never traces dead objects.

You can reduce allocation pressure by using the stack, which will make the GC run less often, but my point is that when it does run you're no better off than Java, and quite a bit worse since Java's GC can run in parallel with the mutator and Go's can't.

"Your comments strongly imply that you have no practical experience with Go's GC. I suggest you stop claiming that it has certain performance characteristics or behaviours when you have not experienced it yourself."

I have experience with GC generally. There's nothing particularly special about Go's GC: it's a standard stop-the-world mark-and-sweep collector for a language that supports a limited form of stack allocation but generally uses heap allocation. The performance characteristics that this form of GC must have are well-known.

If you want data, look at the binary-trees benchmark: http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

It's mostly a test of GC. Java's GC runs more often (thus the memory use is lower) and yet it's still 4x faster. This is because Java has a generational, concurrent-incremental collector.