Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do they depend on CUDA, or are they just much better tuned for NVIDIA cards? I thought the whole ML ecosystem was based on training models and then running them on frameworks, where model was sorta like data and the framework handles the hardware? (albeit with models that can be tweaked to run more efficiently on different hardware) (I don't really know the ecosystem so it is definitely possible that they are more closely tied together than I thought).


The latter. The major frameworks, at least, can be run in CPU-only mode, with a hardware abstraction layer for other devices (like CUDA-capable cards, TPUs etc). So practically it means you need an Nvidia GPU to get anywhere in a reasonable amount of time, but if you're not super dependent on latency (for inference) then CPU is an option. In principle, CPUs can run much bigger model inputs (at the expense of even more latency) because RAM is an order of magnitude more available typically.


I was thinking (as someone who knows nothing about this really) that the Apple chips might be interesting because, while they obviously don't have the GPGPU grunt to compete with NVIDIA, they might have a more practical memory:compute ratio... depending on the application of course.


Is there any blocker to have VRAM swap (on RAM or SSD)? It would make processing much slower, but it should be better than nothing (cause OOM) or alternatively run on CPU (more slower).


Not sure. I suspect the issue would be lots of memory transfer between the GPU and the CPU, because downstream layers usually need previous layer outputs. It would probably depend on the receptive field of the network? Also on how expensive memory transfer is, maybe it's worth it in some cases. But there's no reason why you couldn't run say the first big layers on the CPU and then treat deeper layers (which may take a smaller input) as a separate network to run on the GPU. I suppose you want the largest subgraph in your model that can fit in available VRAM. Certainly the Coral/EdgeTPU will dispatch unsupported operations to the CPU but that affects all ops beyond that point in the computation graph.


From my experience the bigger frameworks may have support for non-CUDA devices (that is not just the CPU fallback) but many smaller libraries and models will not, and will only have a CUDA kernel for some specialized operation.

I encounter this all the time in computer vision models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: