CUDA Pro Tip: Understand Fat Binaries and JIT Caching

Originally published at:

As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must run on multiple generations of GPUs, the NVIDIA compiler tool chain supports compiling for multiple architectures in the same application executable or library. CUDA also relies on the PTX virtual GPU ISA to provide forward compatibility, so that already…

Another potential problem: when a 2-parameter kernel function is in cache, and you're developing a 3-parameter version of this function, if your test suite still calls the obsolete 2-parameter function, it still works even it should not.

My Pro Tip: disable CUDA cache during the development process, and enable it in production only.