PTX JIT caching

In the Fermi compatibility guide it shows how to use the CUDA_FORCE_PTX_JIT environment variable to force JIT of the PTX code. It says that the cubin is cached by the driver and that the cache is even persistent across reboots. However, when I try this with the SDK examples it doesn’t seem to be caching at all:

[codebox][plegresl@bigbird release]$ export CUDA_FORCE_PTX_JIT=0

[plegresl@bigbird release]$ time ./simpleCUBLAS -noprompt

simpleCUBLAS test running…

PASSED

real 0m0.260s

user 0m0.170s

sys 0m0.087s

[plegresl@bigbird release]$ export CUDA_FORCE_PTX_JIT=1

[plegresl@bigbird release]$ time ./simpleCUBLAS -noprompt

simpleCUBLAS test running…

PASSED

real 1m13.848s

user 1m13.005s

sys 0m0.833s

[plegresl@bigbird release]$ time ./simpleCUBLAS -noprompt

simpleCUBLAS test running…

PASSED

real 1m13.830s

user 1m12.981s

sys 0m0.837s

[/codebox]

Is this the expected behavior? It seems like if it was working properly the third invocation would be as fast as the first.

Any answers?

uh, CUDA_FORCE_PTX_JIT is still set to 1 in the third invocation unless I am crazy.

(alternately: FORCE_PTX_JIT actually forces a compile, it does not use the JIT cache.)

I understand now. The documentation isn’t really correct:

When starting a CUDA application for the first time with the above environment flag, the CUDA driver will JIT compile the PTX for each CUDA kernel that is used into native CUBIN code. The generated CUBIN for the target GPU architecture is cached by the CUDA driver. This cache persists across system shutdown/restart events.

It specifically says “first time”, which implied to me that on subsequent calls the cache would be used. Thanks, Tim.