I’m attempting to use dynamic parallelism in CUDA.
- first, compile such kernel into fatbin
- then, load the kernel function from fatbin via CUDA driver api
- However, when executing, I get the error
CUDA_ERROR_NOT_FOUND (500)
Below is the example to reproduce the problem.
File cdp_kernel.cu:
extern "C" __global__ void childKernel() {
int i = 0;
}
extern "C" __global__ void parentKernel() {
childKernel<<<1, 1>>>();
}
Compile cdp_kernel.cu into fatbin:
nvcc --fatbin -rdc=true -lcudadevrt -o cdp_kernel.fatbin cdp_kernel.cu
File main.cu that retrives the parentKernrl from cdp_kernel.fatbin:
#include <cuda.h>
#include <iostream>
#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
if (code != CUDA_SUCCESS) {
std::cout << "Error: " << code << " " << file << "@" << line << std::endl;
const char* msg;
cuGetErrorName(code, &msg);
std::cout << msg << std::endl;
exit(code);
}
}
int main() {
CUcontext context;
CUdevice device;
Errchk(cuInit(0));
Errchk(cuDeviceGet(&device, 0));
Errchk(cuCtxCreate(&context, 0, device));
CUmodule module;
Errchk(cuModuleLoad(&module, "cdp_kernel.fatbin"));
CUfunction kernel;
Errchk(cuModuleGetFunction(&kernel, module, "childKernel"));
Errchk(cuModuleGetFunction(&kernel, module, "parentKernel"));
cuModuleUnload(module);
cuCtxDestroy(context);
return 0;
}
Compile main.cu into executable:
nvcc -o main main.cu -lcuda
Run the executable:
./main
Following error appears:
Error: 500 main.cu@30
CUDA_ERROR_NOT_FOUND
In main.cu, I use cuModuleGetFunction twice
- line 29: load kernel
childKernelworks fine - line 30: load kernel
parentKernelthrows errorCUDA_ERROR_NOT_FOUND
Expected:
Both parentKernel and its child kernel should be loaded.
Actual Result:
Only the child kernel appears to be loaded, while parent kernel not.
Please let me know if more information is needed.
Thanks in advance for any help!