cuModuleLoad fails to load PTX for higher architectures than gpu. (documentation issue or so).


Apperently cuModuleLoad fails when trying to load PTX file for a higher architecture than GPU in system.

This may seem logical or perhaps not… either way… I am not sure if this behaviour is documented.

I was kinda hoping to load all PTX files generated for all architectures, to then later display some informations about them like attributes and such, by querieing the driver api.

Now because the loads fail with error: CUDA_ERROR_NO_BINARY_FOR_GPU this is not possible.

I guess the “just in time compiler” tries to compile the PTX file when it’s loaded and apperently it can’t really compile it… hmm… I still think it’s kinda strange.

Perhaps this API needs to be further refined with an extra api called:

“cuModuleCompile” or “cuModuleBuild”

Then again I can understand for cubins that load would make some sense giving this error… but for PTX it doesn’t really make sense… maybe not even for cubin…

Perhaps this particular error should be postponed until it executes… but that would cause unnecessary compile delays… which should be done only once previously… hence the need for cuModuleCompile;

However compiling for cubins might not make sense, so the api could be renamed to:

^ I think that makes more sense.

This way… it might be possible to query the device driver for attributes for kernels of higher compute capabilities than GPU itself.

This missing functionality is not that much of a big deal.

I guess I can simply adjust my program to set an array of booleans to false for load failures and so forth.

At the very least cuModuleLoad’s documentation should be updated to indicate that loading a PTX for a higher archicture than current GPU could fail.

Ofcourse here lies the problem… PTX is suppose to be future/backwards compatible… but currently it’s not.

This error message again proves that.


Hmmm… I just discovered something… all kernels were compiled for sm_21… perhaps this happened because I didn’t specify -code… I only used -arch.

Apperently the nvcc compiler assumes that I am compiling for my current GPU ?! when using -arch ?!

This is kinda stupid ?! And also inconsistent with the documentation that said: -code takes on -arch parameter if -code not specified ?!

I will try and re-compile all kernels but this time explicitly mention -code !!!

This documentation is weird:

The -code option can be omitted. Only in this case, the -arch value can be a non-virtual architecture. The -code values default to the closest virtual architecture that is implemented by the GPU specified with -arch, plus the -arch value itself (in case the -arch value is a virtual architecture then these two are the same, resulting in a single-code default). After that, the effective -arch value will be the closest virtual architecture: - See more at: file:///C:/Program%20Files/NVIDIA%20GPU%20Computing%20Toolkit/CUDA/v7.0/doc/html/cuda-compiler-driver-nvcc/index.html#nvcc-command-options

I shall now try to specific -code myself to force it to generate SM_52… while my installed GPU is only SM_21.

I want to generate code for other people not myself.

Just for the record, I tried something like this:
nvcc -ptx --machine=32 -arch=sm_53 -code=sm_53

Perhaps I am seeing “ghosts” ;) :)

Once the other modules are loaded, the binary version returned by PTX attributes is always 2.1 ?!

I guess the driver compiled the PTX to some kind of binary version, and is returning the binary version of what it compiled too… so perhaps it’s not a nvcc thing…
Didn’t compile:
nvcc fatal : Value of -arch option (‘sm_53’) must be a virtual code architecture