Scaling video encode to 112 CPU cores is hard. I haven't looked too hard into this encoder but the normal method to scale that high is to encode entire segments in parallel. (YouTube in particular supposedly does each segment single-threaded which is why libvpx has terrible scaling.) Which effectively means encoding up to 112 independent 4k streams.
Each stream could need:
- one source frame
- additional source frames for reordering (3-7 is pretty
normal)
- additional source frames for rate control (x264's default is 40)
- recon for the frame being encoded
- reference frames (IIRC AV1 allows up to 8 to be stored)
Plus MVs, modes, maybe subpel caches, etc.
That's easily 50-60 frames per stream. Times maybe 112 streams for 6000 frames. Easily tunable of course, especially with even a little intra-segment parallelism.
I understand how an encoder could eat up so much memory and justify it in some way, but I can't buy that it's a neccesity or even acceptable in the long run (maybe this is stated to be in the prototype stage).
From what I've seen AV1 breaks frames/segments up into a kd-tree and brute forces these leaves to find the transformation that looks the best with the smallest size. An over simplification obviously, but with everything that encoders are doing I still think it is naive to design them with such a simplistic view of concurrency that they have to be treated as a hundred small files for a hundred CPU cores.
Each stream could need:
- one source frame
- additional source frames for reordering (3-7 is pretty normal)
- additional source frames for rate control (x264's default is 40)
- recon for the frame being encoded
- reference frames (IIRC AV1 allows up to 8 to be stored)
Plus MVs, modes, maybe subpel caches, etc.
That's easily 50-60 frames per stream. Times maybe 112 streams for 6000 frames. Easily tunable of course, especially with even a little intra-segment parallelism.