We have an application where we use device link-time optimization (DLTO). We generate a fatbinary containing PTX for the lowest arch (e.g. sm_52), and LTO and SASS for a number of explicit architectures (e.g. sm_52 and sm_61), using the following options:
Previously, when running the application on a later GPU arch that wasn’t explicitly included in the fatbinary (e.g. sm_86), the driver (v516.59) would JIT compile/link the application and it would run fine. However, after upgrading to the v526.67 driver, the application fails with a “device kernel image is invalid” error.
Is this a bug in the latest NVIDIA GPU driver, or should we be using different compiler/linker options?
This can be reproduced using the above compiler/linker options with the following minimal example:
__device__ int d_result;
__global__ void kernel(const int n)
d_result = n;
const int n = rand();
cudaError cuda_status = cudaGetLastError();
if (cuda_status != cudaSuccess)
std::cout << "FAIL: " << cudaGetErrorString(cuda_status) << std::endl;
std::cout << "PASS";
This is not the point of the question asked, but I am wondering what the rationale for this is? Conventional wisdom is that in order to future-proof a fat binary, one wants to include SASS/LTO for any GPU architectures specifically supported by the application and PTX for the latest architecture supported by the tool chain to cover any future GPU architectures.
Interesting! I’m not sure if the docs have changed since I last looked at them (years ago now ha), but they certainly seem to suggest the approach you outline. To avoid potentially derailing this thread, I have created a new topic here: Fatbinary best practices
The suggested approach would certainly fix the issue in this case. However, it still looks like a bug or otherwise weird behaviour in the new driver, given that the previous driver version was successfully JIT-compiling the compute_52 PTX, whereas the new driver is failing (either to compile the compute_52 PTX, or is erroneously trying to JIT the lto_61 NVVM IR).