How should I use correctly the sm_XX and compute_XX?

rreddy78 · June 29, 2022, 7:48am

Hi all,

I am trying to get some CUDA code to be called from a Python package to work and it fails with the following error:

RuntimeError: CUDA error: no kernel image is available for execution on the device

Now I see that no nvcc_args have been passed while building the python package, but still should it not work even then ?

CUDA Toolkit is: 10.2.89
GPU: V100 (Datasheet says this is the Volta architecture with compute capability 7)

Now I looked at the Volta compatibility guide:

https://docs.nvidia.com/cuda/archive/10.2/volta-compatibility-guide/index.html

And I am sure I am doing something wrong in terms of settings. In the doc it says this for CUDA 9 :
/usr/local/cuda/bin/nvcc
-gencode=arch=compute_50,code=sm_50
-gencode=arch=compute_52,code=sm_52
-gencode=arch=compute_60,code=sm_60
-gencode=arch=compute_61,code=sm_61
-gencode=arch=compute_70,code=sm_70
-gencode=arch=compute_70,code=compute_70
-O2 -o mykernel.o -c mykernel.cu

Some questions:

When should I use code=sm_XX and code=compute_XX or should both be used ?
What should the arguments of -gencode be when I want to target a single GPU architecture without further settings ?
When should the CUDA_FORCE_PTX_JIT variable be set ?

I know there are some technical details on cubin version and PTX version, but I could not make anything of it. Would be really helpful if someone can can give simple set of guidelines for each of the use-cases :)…

Thanks a lot…

njuffa · June 29, 2022, 8:34am

When you read the section on code generation (“Building for Maximum Compability”) in the Best Practices Guide, what exactly was unclear? You may want to consult the nvcc manual in addition to the Best practices Guide.

sm_XX pertains to machine code (SASS, in CUDA parlance) for a particular GPU hardware architecture. compute_XX pertains to virtual architectures represented by the intermediate PTX format. So in your example, the compiler is instructed to produce a fat binary containing SASS for CC 5.0, CC 5.2, CC 6.0, CC6.1, and CC 7.0, as well as PTX for CC 7.0. This is a best practice: Include SASS for all architectures that the application needs to support, as well as PTX for the latest architecture (CC.7.0 for the CUDA version referenced), which can be JIT compiled when a new (as of yet unknown) GPU architecture rolls around.

If you intend to run with a CC 7.0 (Volta) GPU, the compilation options in your example should work just fine for that. If lengthy compilation times bother you, you can just pare down the list of architectures for which code is generated.

If you use the -gencode options shown, this should not happen when running with a V100 (CC 7.0). When the CUDA runtime downloads kernels into the GPU, it first looks for matching SASS in the fat binary. If it cannot find that, it looks for PTX that it can JIT compile. If JIT compilation is inhibited or no suitable PTX is found, it fails with this error. Given that the example specifies that both SASS and PTX code suitable for CC 7.0 are generated, loading the kernel(s) should not fail when the current device is a V100 GPU.

So something doesn’t add up here. You can use the cuobjdump utility to check for which GPU architectures SASS and/or PTX are present in the fat binary.

rreddy78 · June 30, 2022, 12:25pm

Thanks for those helpful hints and pointing the right documentation. I solved by targeting a single architecture and be done with it.

cuobjdump visibility_kernel.o

Fatbin ptx code:

arch = sm_70
code version = [6,5]
producer =
host = linux
compile_size = 64bit
compressed

Fatbin elf code:

arch = sm_70
code version = [1,7]
producer =
host = linux
compile_size = 64bit

system · July 14, 2022, 12:25pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA 11.2 w/ GTX 770? CUDA Setup and Installation	3	1938	December 25, 2020
Cuda Error on GeForce RTX 2080 Ti vs no err on Quadro M2200 CUDA Programming and Performance	5	2454	January 18, 2019
Running PTX Code from CUDA 4.0 in CUDA 4.1 or CUDA 4.2 CUDA Programming and Performance	5	2472	May 30, 2012
Error: no kernel image is available for execution on the device CUDA Setup and Installation	1	3545	October 16, 2019
How can I make a PTX fat binary from individual PTX files? CUDA Programming and Performance	4	313	May 11, 2024
CUDA NVCC creates .target 5.0 CUDA Programming and Performance	4	758	January 12, 2017
CUDA failed to launch kernel : no kernel image available for execution CUDA Setup and Installation	1	1669	May 17, 2022
Can no longer create backward compatible CUDA binary with Titan V and CUDA 9 CUDA Setup and Installation	4	1041	August 2, 2018
Need Help to get CUDA running with c++ CUDA Setup and Installation	1	523	June 25, 2019
Nvcc lower version than CUDA causes compiled code runtime error 300 CUDA NVCC Compiler	4	55	September 24, 2024

How should I use correctly the sm_XX and compute_XX?

Fatbin ptx code:

Fatbin elf code:

Related topics