Disable PTX JIT Compilation

image
Hello folks,

I am having hard times to disable the PTX JIT compilation of my code as i am forcing CUDA_DISABLE_PTX_JIT to 1 on Windows Environment variables (See on the image).

Please can you help ?

Thanks

Abdoulaye

Set this environment variable:

CUDA_DISABLE_PTX_JIT=1

There is the relevant section of the CUDA Programming Guide:

That’s what I did as shown on screenshot

You wrote about CUDA_CACHE_DISABLE, not CUDA_DISABLE_PTX_JIT. Can you be more specific about the issue than “I am having hard times”?

Have you double checked that you are setting CUDA_DISABLE_PTX_JIT=1 in the environment relevant to the execution of your application? For example, you could call getenv() in your application to inspect the environment the application operates in.

You could also try setting CUDA_DISABLE_JIT=1.

Sorry I mistyped the keyword.

I did this and getenv() is also giving “1”. My problem is whenever I am using Visual Studio Debug Mode, my program will have a 30 seconds delay at the line where I am supposed to create a CUDA context for using CuSolver. So I suspected that the PTX JIT compilation was the culprit.

Also, when using the Nsight Compute, the PTX code will appear along with the SASS. I only want to show SASS for intellectual property purpose.

Thanks

Abdoulaye

Unless there are other differences in the compilation flags for debug and release builds, I would expect any noticeable JIT overhead to affect both debug and release builds. Because debug builds are unoptimized, I would further expect any JIT overhead that does occur to be slightly lower for debug builds. So attributing delay seen when starting up the debug build to JIT overhead seems questionable.

Build the binary without including PTX for those situations. Depositing only SASS in the fat binary is a choice readily available to the programmer. See documentation of the -gencode switch. This will also be a good experiment in investigating the mysterious 30 second delay. Without PTX in the fat binary, JIT compilation is impossible. That still leaves the possibility of the presence of PTX in libraries you do not control. Outside of special circumstance I would libraries to be compiled such that SASS is include for all supported GPU architectures, meaing PTX JIT compilation never comes into play.

Build the binary without including PTX for those situations. Depositing only SASS in the fat binary is a choice readily available to the programmer. See documentation of the -gencode switch. This will also be a good experiment in investigating the mysterious 30 second delay. Without PTX in the fat binary, JIT compilation is impossible.

My configuration is arch=compute_86,code=sm_86

Use cuobjdump --dump-sass and cuobjdump --dump-ptx to examine what SASS and what PTX were deposited into your binary.

Well I was able to run both commands and for PTX, I can see my functions/variables symbols listed. So, do you know how to disable the PTX ?

Wait ! On my visual studio script, I am seeing -gencode=arch=compute_86,code="sm_86,compute_86.

But the command i put on option is compute_86,sm_86. Is it right?

With

alone, there should be no PTX embedded in the fat binary. Just to be sure, I compiled a small test application in two ways:

nvcc -o zcopy.exe -arch=sm_30 zcopy.cu
nvcc -o zcopy.exe -gencode arch=compute_30,code=sm_30 zcopy.cu

Using cuobjdump to inspect, I find that the binary produced by the first build contains PTX and SASS for sm_30, while the binary produced by the second build contains SASS for sm_30 and an empty PTX section. This is exactly as expected.

So if you are seeing PTX code in the fat binary, there must be other -gencode instances besides -gencode arch=compute_86,code=sm_86.

The compute_86 part is what causes PTX to be deposited into the fat binary. Take a look at the documentation for -gencode: sm_XX deposits SASS for a real GPU architecture, compute_XX deposits PTX for a virtual GPU architecture. Your build specifies to do both.

image

Well I think i found the problem. But i don’t know why it’s compiling "-gencode=arch=compute_86,code=“sm_86,compute_86” instead of "-gencode=arch=compute_86,code=“sm_86”

Ok I got you now !! I just simply need to put sm_86. Thank you so much for your help :)

@njuffa please a little help. Can you tell me what i put exactly on this Visual Studio input ? I tried sm_86 then sm_86,sm_86 and none of them are working. Thanks

@njuffa i think that I finally found the issue. It seems that the PTX JIT is only activated when running under CUDA Debugging (Next-Gen). Whilst when running from Start button (VS2022), the JIT are deactivated.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.