Yikes, bad computation results with CUDA 7.5 release driver


out of curiosity I installed CUDA 7.5 on a GTX 970 equipped Ubuntu 12.04 system (I know, unsupported configuration…). I tested some of the included CUDA samples, these ran fine.

Interestingly one of our production kernels gave bad computation results after the driver update. The code was compiled for sm_30 with the CUDA 5.0 toolkit, and hence I suspect a problem with the runtime translation onto my GTX 970 GPU done by the driver.

Has anyone else observed known good kernels producing bad output after upgrading from an older driver (e.g. CUDA 7.0) to the driver shipping with the CUDA 7.5 release?

I downgraded the CUDA driver to the one included in the CUDA 7.0 toolkit and all is fine again.


You would probably want to report this to NVIDIA as a bug, although it seems possible that your original source code contained a latent bug that now has been exposed by the more aggressively optimizing compiler backend (ptxas) in the new driver.

The potential for cross-architecture JIT compilation to be broken is somewhat greater than for offline compilation. The offline compiler has to deal with exactly one combination of frontend and backend that is carefully tested. For JIT compilation there are numerous such combinations of many older frontend producing multiple different older versions of PTX that are subsequently compiled with one recent backend. That is a much bigger test space.

It is possible the new backend in the driver has a bug, but it is also possible the old frontend that produced the PTX used in your JIT compilation had a bug that went undetected due to artifacts of older backends’ code generation.

My recommendation would be to rely on JIT compilation only if absolutely necessary. Instead, create a fat binary which incorporated SASS for each GPU architecture your app needs to support, plus PTX for only the latest architecture for forward compatibility with yet to ship architectures.

While we’d love to work without making use of JIT translation, we cannot currently target Maxwell GPUs with the CUDA 5.0 toolkit that we are building our application with. Upgrading the toolkit is considered an even bigger risk than upgrading the driver. ;)

We’ve previously used CUDA 2.3, switched to 5.0 not so long ago and the next leap may be to 7.0 (because I like that it supports lambda expressions and is still supported on Ubuntu 12.04).

As for creating a bug report: That faulty kernel is part of a bigger project, and building a simple repro case might cause too much effort for now. I’ll reconsider filing a bug report later.


In that case, unless you absolutely need some functionality provided by a newer driver, I’d say stick with the one you know works.

Whether upgrading the toolkit or the driver is a bigger risk I cannot answer based on data, but my gut feeling is that a driver change represents a bigger risk in terms functionally broken code: It is a complex low-level mechanism with close to zero visibility to the average CUDA programmer. With a toolkit change one can at least look at the generated code, check timing with the profiler, use cuda-memcheck to check for race conditions, etc.

As for using JIT compilation to target GPU architectures that did not exist at the time the application code was created, my long-standing recommendation is to switch to natively compiled code at the first chance, that is, asap.

Long-term use of JIT compilation is best limited to dynamic code generation scenarios, where code must be generated at run time.