How is it cheaper? In both formulations, you end up with one swap and one RNG call per operation on the vector. With the shuffle vector you pay those costs up front, with swap-with-last you pay those costs lazily. Asymptotically there's no difference there at all.
> This would basically be distributing the Fisher-Yates shuffle over the iterations, rather than running it when the pieces run out.
This obviously makes no difference for a sequence of seven tetronimos, but perhaps there are contexts imaginable where one would do similar operations over very large sequences, in which case paying costs lazily could be nice.
EDIT: I just remembered that Go guarantees that iterating over the built-in maps is guaranteed to be in random order. I wonder how that is implemented - perhaps something like this might be useful there for the situations where one often bails out of such loops early.
Since it's too late to edit: nope, that won't work for Go maps, because it only shuffles elements already visited, so bailing out of the loop early guarantees the first N elements will be the same N elements that have been iterated over up until that point. Which breaks the randomness guarantee.
OTOH, that actually is kind of an interesting feature; there might be a few situations where that is desired behavior.