There exist four combinations:
32-bit application loads 32-bit cubin <-- Possible
32-bit application loads 64-bit cubin <-- Is this possible?
64-bit application loads 32-bit cubin <-- Is this possible?
64-bit application loads 64-bit cubin <-- Possible
In CUDA 4.2, it looks like the second & third combinations are not supported with my tests with cuModuleLoad.
Are they really unsupported?
In CUDA 3.0, to load 32-bit cubin with 64-bit application is possible. Is it no longer supported?
CUDA requires all data types including long, size_t, and pointers to have the same size on the host and the device. In that context, using a cubin with a different bitness as the bitness of the host code it interfaces to does not make sense to me, and would appear to lead to errors because of data size mismatch during kernel calls.
I confirmed with the driver team that the behavior you are seeing with CUDA 4.2 (i.e. a 32-bit cubin requires a 32-bit application, and a 64-bit cubin requires a 64-bit application) is the designed behavior and in that sense correct. I can’t speak to the behavior of previous CUDA versions.