Error loading dynamic parallelism kernel from fatbin via CUDA driver api

I’m attempting to use dynamic parallelism in CUDA.

  • first, compile such kernel into fatbin
  • then, load the kernel function from fatbin via CUDA driver api
  • However, when executing, I get the error CUDA_ERROR_NOT_FOUND (500)

Below is the example to reproduce the problem.

File cdp_kernel.cu:

extern "C" __global__ void childKernel() {
    int i = 0;
}

extern "C" __global__ void parentKernel() {
    childKernel<<<1, 1>>>();
}

Compile cdp_kernel.cu into fatbin:

nvcc --fatbin -rdc=true -lcudadevrt -o cdp_kernel.fatbin cdp_kernel.cu

File main.cu that retrives the parentKernrl from cdp_kernel.fatbin:

#include <cuda.h>
#include <iostream>

#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
    if (code != CUDA_SUCCESS) {
        std::cout << "Error: " << code << " " <<  file << "@" << line << std::endl;
        const char* msg;
        cuGetErrorName(code, &msg);
        std::cout << msg << std::endl;
        exit(code);
    }
}

int main() {
    CUcontext context;
    CUdevice device;
    Errchk(cuInit(0));
    Errchk(cuDeviceGet(&device, 0));
    Errchk(cuCtxCreate(&context, 0, device));
    
    CUmodule module;
    Errchk(cuModuleLoad(&module, "cdp_kernel.fatbin"));
    
    CUfunction kernel;
    Errchk(cuModuleGetFunction(&kernel, module, "childKernel"));
    Errchk(cuModuleGetFunction(&kernel, module, "parentKernel"));

    cuModuleUnload(module);
    cuCtxDestroy(context);
    return 0;
}

Compile main.cu into executable:

nvcc -o main main.cu -lcuda

Run the executable:

./main

Following error appears:

Error: 500 main.cu@30
CUDA_ERROR_NOT_FOUND

In main.cu, I use cuModuleGetFunction twice

  • line 29: load kernel childKernel works fine
  • line 30: load kernel parentKernel throws error CUDA_ERROR_NOT_FOUND

Expected:
Both parentKernel and its child kernel should be loaded.

Actual Result:
Only the child kernel appears to be loaded, while parent kernel not.

Please let me know if more information is needed.
Thanks in advance for any help!

https://stackoverflow.com/questions/69133608/error-compiling-cuda-dynamic-parallelism-code-with-driver-api

1 Like

Thank you so much for the help! Your suggestion solved my problem and I also learned a lot in the process—thanks again!