CUDA Context-Independent Module Loading

Originally published at: https://developer.nvidia.com/blog/cuda-context-independent-module-loading/

Most CUDA developers are familiar with the cuModuleLoad API and its counterparts for loading a module containing device code into a CUDA context. In most cases, you want to load identical device code on all devices. This requires loading device code into each CUDA context explicitly. Moreover, libraries and frameworks that do not control context creation and…