The driver caches the binary code generated after JIT compiling PTX. I want the driver to not cache the generated code, and preferably not do any disk activity at all. Is it possible?
To avoid JITing of PTX, make sure your build incorporates SASS (machine code) for all desired target platforms into the fat binaryit produces. When the driver loads a fat binary, it will first search the binary for SASS matching the currently bound GPU’s architecture. If it cannot find such code it will look for suitable PTX for JIT compilation to SASS. If that fails, it returns an error.
The CUDA C Programming Guide also mentions an environment variable CUDA_CACHE_DISABLE (section 220.127.116.11, “Just-in-Time Compilation”).
FYI, under some circumstances building in the SASS does not prevent all disk activity. I got it to work for small applications, but for larger applications it still tries to lock access the cache index file for some reason.
I know because this triggers a bug in our NFS appliance that puts the file lock into an infinite loop. CUDA_CACHE_DISABLE=1 thankfully prevents it from even attempting to lock the index.
Interesting. I was not aware of this behavior, and do not know if there is a solid technical reason behind it or whether it may be unintentional. If this is troublesome, I would suggest filing a bug / enhancement request against the CUDA driver. At least the environment variable appears to provide a workaround.
Yeah I did that. In my particular issue, the bug was resolved as “Not an NV bug”. As far as I know we’ve still got an open bug with the vendor of the NFS appliance, who’s bug system appears to be a simple black hole. File locks are causing problems for many other applications too, so I’m OK with NVIDIA’s response. I would have preferred an update so that CUDA simply reported an error instead of waiting indefinitely for the lock.
Thanks everyone. I am now using CUDA_CACHE_DISABLE=1 and that appears to work.