Is the shuffle vector a novel data structure? I've never seen it before but it l...

jsnell · on Feb 17, 2019

It seems very closely related to the well known trick for deleting elements from a middle of an array in constant time (swap last element with the element to be removed, then pop the last element). The difference is just that here the swap happens on insertion rather than on deletion.

vanderZwan · on Feb 17, 2019

Yes, kind of, but that is more when you have an unsorted array and don't worry about making it even less sorted.

Shuffle vectors guarantee randomized sampling of a known set of elements for very cheap, with the ability to return elements to the set. That seems like it could have some very interesting niche usages.

It kind of reminded me of modern Tetris implementations. Instead of picking one tetronimo at random for the next piece, they generate random sequences of each tetronimo. However, this context never requires returning a single element to the set of options - you can just generate a new sequence as a whole once you run out of pieces.

Although... you could "flip" the shuffle-vector by swapping with the elements already taken: make a small buffer of seven elements (one for each piece), circularly iterate over it, and whenever you reach element i you swap the value with a random position in the (inclusive range from zero to i. This would basically be distributing the Fisher-Yates shuffle over the iterations, rather than running it when the pieces run out.

For some reason though, I doubt that tetris piece randomization was ever a source of performance instability for these games.

jsnell · on Feb 17, 2019

How is it cheaper? In both formulations, you end up with one swap and one RNG call per operation on the vector. With the shuffle vector you pay those costs up front, with swap-with-last you pay those costs lazily. Asymptotically there's no difference there at all.

vanderZwan · on Feb 17, 2019

It's not, that's what I meant with:

> This would basically be distributing the Fisher-Yates shuffle over the iterations, rather than running it when the pieces run out.

This obviously makes no difference for a sequence of seven tetronimos, but perhaps there are contexts imaginable where one would do similar operations over very large sequences, in which case paying costs lazily could be nice.

EDIT: I just remembered that Go guarantees that iterating over the built-in maps is guaranteed to be in random order. I wonder how that is implemented - perhaps something like this might be useful there for the situations where one often bails out of such loops early.

vanderZwan · on Feb 18, 2019

Since it's too late to edit: nope, that won't work for Go maps, because it only shuffles elements already visited, so bailing out of the loop early guarantees the first N elements will be the same N elements that have been iterated over up until that point. Which breaks the randomness guarantee.

OTOH, that actually is kind of an interesting feature; there might be a few situations where that is desired behavior.

tyingq · on Feb 17, 2019

That's roughly why a typical stream cipher like rc4 is fast isn't it...as opposed to CBC?