The latter. The major frameworks, at least, can be run in CPU-only mode, with a ...

bee_rider · on Sept 1, 2022

I was thinking (as someone who knows nothing about this really) that the Apple chips might be interesting because, while they obviously don't have the GPGPU grunt to compete with NVIDIA, they might have a more practical memory:compute ratio... depending on the application of course.

fomine3 · on Sept 2, 2022

Is there any blocker to have VRAM swap (on RAM or SSD)? It would make processing much slower, but it should be better than nothing (cause OOM) or alternatively run on CPU (more slower).

joshvm · on Sept 2, 2022

Not sure. I suspect the issue would be lots of memory transfer between the GPU and the CPU, because downstream layers usually need previous layer outputs. It would probably depend on the receptive field of the network? Also on how expensive memory transfer is, maybe it's worth it in some cases. But there's no reason why you couldn't run say the first big layers on the CPU and then treat deeper layers (which may take a smaller input) as a separate network to run on the GPU. I suppose you want the largest subgraph in your model that can fit in available VRAM. Certainly the Coral/EdgeTPU will dispatch unsupported operations to the CPU but that affects all ops beyond that point in the computation graph.