Apperently cuModuleLoad fails when trying to load PTX file for a higher architecture than GPU in system.
This may seem logical or perhaps not… either way… I am not sure if this behaviour is documented.
I was kinda hoping to load all PTX files generated for all architectures, to then later display some informations about them like attributes and such, by querieing the driver api.
Now because the loads fail with error: CUDA_ERROR_NO_BINARY_FOR_GPU this is not possible.
I guess the “just in time compiler” tries to compile the PTX file when it’s loaded and apperently it can’t really compile it… hmm… I still think it’s kinda strange.
Perhaps this API needs to be further refined with an extra api called:
“cuModuleCompile” or “cuModuleBuild”
Then again I can understand for cubins that load would make some sense giving this error… but for PTX it doesn’t really make sense… maybe not even for cubin…
Perhaps this particular error should be postponed until it executes… but that would cause unnecessary compile delays… which should be done only once previously… hence the need for cuModuleCompile;
However compiling for cubins might not make sense, so the api could be renamed to:
^ I think that makes more sense.
This way… it might be possible to query the device driver for attributes for kernels of higher compute capabilities than GPU itself.
This missing functionality is not that much of a big deal.
I guess I can simply adjust my program to set an array of booleans to false for load failures and so forth.
At the very least cuModuleLoad’s documentation should be updated to indicate that loading a PTX for a higher archicture than current GPU could fail.
Ofcourse here lies the problem… PTX is suppose to be future/backwards compatible… but currently it’s not.
This error message again proves that.