I’ll start with some background to explain why I’m asking this particular question. I have an HPC compute server, and its entire purpose is to run research calculations with CUDA acceleration. I updated the server to Ubuntu 14.04 (it was previously running Ubuntu 12.04) and ran into a major issue with the drivers. If it helps, this server does not have a desktop installed, and the video output does not use the same card as the ones used for calculations.
I was using the .deb install which sets up a CUDA PPA, and I was using drivers provided by the Ubuntu repositories. I was able to install multiple different different version of the CUDA toolkit through this PPA and numerous versions of the Nvidia drivers. In all, I tried dozens of combinations of Nvidia driver + CUDA version. In every case, I was able to install CUDA and the graphics driver, compile the software I use (Gromacs), but I had catastrophic errors when it came time to actually run the software. In numerous cases, the failure actually caused a lock up in the system that required a reboot. I tried a combination of the .run file installation with drivers provided by Ubuntu repositories with similar results. Now there are many, many possible points of failure, so I wasn’t pointing any fingers at any one thing.
Fortunately, I stumbled across a solution that seemed to work. I used the combination of the .run file installation and the drivers that came provided within that package. It worked perfectly. Now I’ve come to my current problem.
The CUDA toolkit installs a driver that ONLY works for the current kernel version when you install it. Ubuntu sometimes updates its kernel once a week. Right now, my workaround is to modify the grub settings to always boot from the version of the kernel that I had installed when I finally got CUDA working. However, in the long run this is bad practice as the kernel updates are primarily for security. I’m wondering if anyone knows a relatively simple way to update this version of the driver for a given kernel. I’d prefer to stick with ONE version of the driver which goes along with ensuring that my research results are as reproducible as possible. Is there perhaps something in the .run file installation that would install ONLY the driver and not the rest of CUDA? I’m a bit afraid of just trying the .run file again myself as it took me days to randomly stumble across a combination that actually worked. I’d really be unhappy if I ended up breaking things all over again by stumbling around in the installer with no idea what I’m doing.
Thanks for any suggestions.