Hi, we tried running an application compiled with 12.6 cuda toolkit (toolkit driver version 560.35.05) on a handful of machines with different GPUs, but the same driver version 535.183.01.
RTX 2060 and GTX 1060 failed with cudaErrorCompatNotSupportedOnDevice= 'forward compatibility attempted on non supported HW'.
However, the documentation says:
applications compiled with a CUDA Toolkit release from within a CUDA major release family can run, with limited feature-set, on systems having at least the minimum required driver version (>=525.60.13)". 1. Why CUDA Compatibility — CUDA Compatibility r555 documentation
Provided I understood it right, I assumed it means that there shouldn’t be any issues running cuda 12.6 applications on systems with 535 driver. Is that correct?
The worst I imagined was gettingcudaErrorCallRequiresNewerDriverdue to the aforementioned limited feature-set limitation across minor versions:
To use other CUDA APIs introduced in a minor release (that require a new driver), one would have to implement fallbacks or fail gracefully.
…
A new error code is added to indicate that the functionality is missing from the driver you are running against cudaErrorCallRequiresNewerDriver https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#cuda-compatibility-
First I updated to 550, to no avail. Installing 560.35.05 fixed the issue on 2060 (1060 pending confirmation). Is that expected behaviour? As mentioned above, I assumed that there shouldn’t be any issues running cuda 12.6 applications on systems with the 535 driver because it meets min driver version requirement (>525). What am I missing?
Good question, which made me realize that I did not mention the OS I am running: Ubuntu 22. Thanks for that!
I am using proprietary drivers. I installed 535 and 550 with apt install nvidia-driver-${version}, I also tried installing the latter using nvidia’s run file like so wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run. Both caused the issue I described.
Finally, grabbing the latest run file cuda_12.6.3_560.35.05_linux.run fixed the issue. The problem is, 560 does not appear to be available as part of Ubuntu 22 package (seehttps://launchpad.net/ubuntu/+source/nvidia-graphics-drivers-560, it’s just Oracular and Plucky), whereas we need libnvidia-gl-${driver_version} (NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD), which happens to be part of the Ubuntu package.
So I ended up with a mix of whatever is installed by apt install nvidia-driver-550and driver+crt from cuda_12.6.3_560.35.05_linux.run. I am not sure how bad that is.
Initially I thought you were having trouble due to the, “limited feature-set”, qualifier in the forward compatibility requirements, but were not getting the, “cudaErrorCallRequiresNewerDriver”, message for some reason.
I then saw in the Release Notes for 12.6 that this was the first version to use Open drivers by default and this was causing issues. On rechecking, I now see that it’s only your Pascal 1060 that should be incompatible, so perhaps a side issue.
Looking at the, “limited features” issue, this mentions:
“Sometimes features introduced in a CUDA Toolkit version may actually span both the toolkit and the driver. In such cases an application that relies on features introduced in a newer version of the toolkit and driver may return the following error on older drivers: cudaErrorCallRequiresNewerDriver. As mentioned earlier, admins should then upgrade the installed driver also.”
And given your application was built with 12.6 and only functioned with that version driver, this was the situation you were striking, albeit with a different error message.
Hopefully someone more knowledgable will comment.
On the Open vs proprietary driver issue, I have no experience with apt, being in the Redhat ecosystem and always using runfile installers. Looking at the last line of Section 1.2.1 in my first link above, it sounds like open driver packages should be prefixed, “nvidia-open”.
Thank so much for looking into this. I did not know that open drivers is the default, and preferred option for Turing and newer GPUs. So I made sure that I installed the open version of the latest available nvidia package, i.e. 550 for Ubuntu 22. Sadly, to no avail.
“Sometimes features introduced in a CUDA Toolkit version may actually span both the toolkit and the driver. In such cases an application that relies on features introduced in a newer version of the toolkit and driver may return the following error on older drivers: cudaErrorCallRequiresNewerDriver. As mentioned earlier, admins should then upgrade the installed driver also.”
And given your application was built with 12.6 and only functioned with that version driver, this was the situation you were striking, albeit with a different error message.
The error that I got cudaErrorCompatNotSupportedOnDevice already happens when calling[cudaGetDeviceCount]. It’s a bit suspicious that such basic functionality could break on account of the limited feature-set. Regarding cudaErrorCompatNotSupportedOnDevice, the docu says the following:
* This error indicates that the system was upgraded to run with forward compatibility
* but the visible hardware detected by CUDA does not support this configuration.
* Refer to the compatibility documentation for the supported hardware matrix or ensure
* that only supported hardware is visible during initialization via the CUDA_VISIBLE_DEVICES
* environment variable.
Which I feel like should suggest a more sever incompatibility, ie the one that could appear if min driver version requirement is violated. But like you said, it could be that it is indeed limited features issue, only with a different error message.
Hopefully someone more knowledgable will comment.
Let’s see, perhaps someone will shed some light on this. But once again @rs277 , thanks a lot for all the info!