Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Scaling video encode to 112 CPU cores is hard. I haven't looked too hard into this encoder but the normal method to scale that high is to encode entire segments in parallel. (YouTube in particular supposedly does each segment single-threaded which is why libvpx has terrible scaling.) Which effectively means encoding up to 112 independent 4k streams.

Each stream could need:

- one source frame

- additional source frames for reordering (3-7 is pretty normal)

- additional source frames for rate control (x264's default is 40)

- recon for the frame being encoded

- reference frames (IIRC AV1 allows up to 8 to be stored)

Plus MVs, modes, maybe subpel caches, etc.

That's easily 50-60 frames per stream. Times maybe 112 streams for 6000 frames. Easily tunable of course, especially with even a little intra-segment parallelism.



I understand how an encoder could eat up so much memory and justify it in some way, but I can't buy that it's a neccesity or even acceptable in the long run (maybe this is stated to be in the prototype stage).

From what I've seen AV1 breaks frames/segments up into a kd-tree and brute forces these leaves to find the transformation that looks the best with the smallest size. An over simplification obviously, but with everything that encoders are doing I still think it is naive to design them with such a simplistic view of concurrency that they have to be treated as a hundred small files for a hundred CPU cores.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: