I'm using CUDA under Ubuntu, and have noticed that the CUDA library uninstalls i...

ktm5j · on Nov 24, 2019

You probably installed a kernel update and the nvidia kernel module didn't recompile itself. You can avoid having to reinstall the whole driver package by just running "dkms autoinstall" and then "modprobe nvidia"

You may first need to unload any loaded nvidia modules (built for an older kernel).. So some combination of "rmmod nvidia_modeset" "rmmod nvidia_uvm" "rmmod nvidia_drm" "rmmod nvidia" and then run dkms

I run a ~1000 node server room for a computer science graduate program at a university.. keeping these drivers built and loaded properly has been a nightmare! Nvidia really needs to get things worked out if they want to keep pushing the GPGPU stuff

mroche · on Nov 24, 2019

What distro are you running in your labs? At my university’s cluster running with RHEL 6 (same applies to RHEL 7, hopefully they managed the upgrade over the summer) all that’s needed is installing dkms, then the CUDA repo from NVIDIA, which includes the driver and CUDA packages. Any kernel update will rebuild the kmod on reboot. I’m not 100% certain if that repo is Tesla cards only (which is what we were) but ELRepo also has the generic driver and associated bits (same as negativo, but those are very granular). DKMS is really the only piece that’s necessary to keep the system running (other than keeping an eye on which kernel version the kmods were based on if using the non-NVIDIA repo). DKMS also works just fine with the NVIDIA provides installer, just make sure you have the libglvnd bits installed before you install the driver.

proverbialbunny · on Nov 24, 2019

There is an nvidia dev apt repo that is great for installing and setting CUDA up because apt does all the heavy lifting for you. I've never had any problems with it.

First an optional prerequisite, for updated video drivers:

``` sudo apt-get install -y software-properties-common && sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get install -y nvidia-driver-NNN ``` (Eg, nvidia-driver-435)

This is sometimes a required prerequisite, because these drivers have 32bit and 64bit binaries in them where the ones from nvidia's website or normal apt packages only have the 64bit drivers. (Eg, it's a requirement for Steam and many video games, which will suddenly stop working when CUDA is installed.)

Then there is CUDA itself:

``` sudo apt-get install -y gnupg2 curl ca-certificates && curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubu... | sudo apt-key add - && sudo echo "deb https://developer.download.nvidia.com/compute/machine-learni... /" > /etc/apt/sources.list.d/nvidia-ml.list && sudo apt-get update && sudo apt-get install -y nvidia-cuda-toolkit libcudnn7 libcudnn7-dev ``` (If also using cudnn.)

Also note: NVIDIA currently doesn't officially support Ubuntu 19, but their 18.04 repo works perfectly for 19. In the future you can always try grabbing from https://developer.download.nvidia.com/compute/cuda/repos/ubu... instead.