It sounds like there's forks that are able to work with <=8GB cards. And I'm not sure but I think the weights are using f32, so switching to half might make it yet easier still to get this to work w/less memory.
But yeah the next generation of models would probably capitalize on more memory somehow.
But yeah the next generation of models would probably capitalize on more memory somehow.