compiling and running PTX code via CUDA’s driver-level API (
cuLinkComplete) involves a on-disk cache to avoid the costly optimization step when running the same kernel again in a subsequent program launch.
This is generally great and probably the right thing to do for 99% of use cases. That said, in my project, I have the need to store extra information along with the kernel, forcing me to implement my own caching solution.
The slightly annoying part of this is that there are now 2 redundant levels of on-disk caching – one done by my library, and one done by CUDA driver, and I would like to disable the CUDA cache. This turns out to be possible – there is a
CUDA_CACHE_DISABLE environment map variable that I can set on the command line.
But this flag is far too coarse: it turns of the CUDA cache for everything happening in the process. This does not go well when importing multiple projects (e.g. my library and PyTorch) into the same process, which might be a Python interpreter. Each components has different needs with regards to this caching feature, so a environment variable is not the right level of abstraction.
I don’t think that this feature exists at the moment (and I would be delighted to find out if it did), so this is likely a feature request: it would be great if there was a flag, say,
CU_JIT_USE_DISK_CACHE that I could provide to
cuLinkCreate() to override exactly what it should do in that specific place.