Is there any blocker to have VRAM swap (on RAM or SSD)? It would make processing much slower, but it should be better than nothing (cause OOM) or alternatively run on CPU (more slower).
Not sure. I suspect the issue would be lots of memory transfer between the GPU and the CPU, because downstream layers usually need previous layer outputs. It would probably depend on the receptive field of the network? Also on how expensive memory transfer is, maybe it's worth it in some cases. But there's no reason why you couldn't run say the first big layers on the CPU and then treat deeper layers (which may take a smaller input) as a separate network to run on the GPU. I suppose you want the largest subgraph in your model that can fit in available VRAM. Certainly the Coral/EdgeTPU will dispatch unsupported operations to the CPU but that affects all ops beyond that point in the computation graph.