Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In a multithreaded program, a bump allocator requires locks. That kills their performance advantage.

Wait, what?

What's wrong with:

    char theHeap[0x1000000];
    atomic_ulong bumpPtr;

    void* bump_malloc(int size){
        uint32_t returnCandidate = bumpPtr.fetch_add(size, std::memory_order_relaxed); 
        if(returnCandidate + size >= HEAP_SIZE){
            // Garbage collect. Super complicated, lets ignore it lol.
            // Once garbage collect is done, try to malloc again. If fail then panic().
        }
        return &theHeap[returnCandidate];
    }
------

You don't even need acq_release consistency here, as far as I can tell. Even purely relaxed memory ordering seems to work, which means you definitely don't need locks or even memory barriers.

The exception is maybe the garbage-collect routine. Locking for garbage collect is probably reasonable (other programmers accept that garbage-collection is heavy and may incur a large running + synchronization costs), but keeping locks outside of the "hot path" is the goal.

------

This is what I do with my GPU test programs, where locking is a very, very bad idea (but possible to do). Even atomics are kinda-bad in the GPU world but relaxed-atomics are quite fast.

-------

> In Java, this requires 15 000 separate allocations, each producing a separate reference that must be managed.

Wait, what?

    points [] array = new points[15000];
This is a singular alloc in Java. 15000 _constructors_ will be called IIRC (My Java is rusty though, someone double-check me on that), but that's not the same as 15000 allocs being called, not by a long shot.

------

Frankly, it seems like this poster is reasonably good at understanding Go performance, but doesn't seem to know much about Java performance or implementations.



    Point[] array=new Point[15000];
In java would create an array with null references, to fill it up you need to create each object so that point is valid.


I stand corrected on that point.


Even with relaxed memory ordering, you’re still bouncing the cache line around between every CPU that tries to access bumpPtr. I would expect that to be significantly slower than using per-thread heaps (especially if you have a large number of CPUs).


Java bump pointer allocations (ie: normal allocations, except for eg: large array allocations) occur in a thread local allocation buffer. Only when a thread exhausts its current allocation buffer does it need to worry about potentially contending with other threads to acquire a new one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: