Do all custom kernels need to be pre-compiled into .ptx file before they can be “cuModuleLoad”?
Yes, ‘ptx’ contains the kernel instructions for the GPU. Just as you generate exe for a CPU from a C/C++ program by using a compiler, you generate ptx code for the GPU with nvcc compiler. You must use a different compiler for the GPU because the hardware is quite different.
Thank you for the answer. What I’m asking is specifically about how can I define a custom kernel in .cuh file and directly include and call it through ComputeKernel by e.g. function pointer without precompile it to ptx file then refer it by file name in a CUmodule.
There is a way: You may be interested in NVRTC, NVIDIA Run-time compilation library.
Please note that you should carefully consider if this approach is desirable. The run-time compiler may add consider size to your binaries (or download) since it requires that you include the nvcc compiler with your app. Also, run-time compilation can make the app loading time considerably longer. However it can be useful, for example, in the cases when you want your own third party users to write kernels at run-time or through a visual interface.
Thank you, that’s very informative.