uh, CUDA_FORCE_PTX_JIT is still set to 1 in the third invocation unless I am crazy.
(alternately: FORCE_PTX_JIT actually forces a compile, it does not use the JIT cache.)
uh, CUDA_FORCE_PTX_JIT is still set to 1 in the third invocation unless I am crazy.
(alternately: FORCE_PTX_JIT actually forces a compile, it does not use the JIT cache.)