Error loading dynamic parallelism kernel from fatbin via CUDA driver api

chyan_lea · May 8, 2025, 3:46am

I’m attempting to use dynamic parallelism in CUDA.

first, compile such kernel into fatbin
then, load the kernel function from fatbin via CUDA driver api
However, when executing, I get the error CUDA_ERROR_NOT_FOUND (500)

Below is the example to reproduce the problem.

File cdp_kernel.cu:

extern "C" __global__ void childKernel() {
    int i = 0;
}

extern "C" __global__ void parentKernel() {
    childKernel<<<1, 1>>>();
}

Compile cdp_kernel.cu into fatbin:

nvcc --fatbin -rdc=true -lcudadevrt -o cdp_kernel.fatbin cdp_kernel.cu

File main.cu that retrives the parentKernrl from cdp_kernel.fatbin:

#include <cuda.h>
#include <iostream>

#define Errchk(ans) { DrvAssert((ans), __FILE__, __LINE__); }
inline void DrvAssert( CUresult code, const char *file, int line)
{
    if (code != CUDA_SUCCESS) {
        std::cout << "Error: " << code << " " <<  file << "@" << line << std::endl;
        const char* msg;
        cuGetErrorName(code, &msg);
        std::cout << msg << std::endl;
        exit(code);
    }
}

int main() {
    CUcontext context;
    CUdevice device;
    Errchk(cuInit(0));
    Errchk(cuDeviceGet(&device, 0));
    Errchk(cuCtxCreate(&context, 0, device));
    
    CUmodule module;
    Errchk(cuModuleLoad(&module, "cdp_kernel.fatbin"));
    
    CUfunction kernel;
    Errchk(cuModuleGetFunction(&kernel, module, "childKernel"));
    Errchk(cuModuleGetFunction(&kernel, module, "parentKernel"));

    cuModuleUnload(module);
    cuCtxDestroy(context);
    return 0;
}

Compile main.cu into executable:

nvcc -o main main.cu -lcuda

Run the executable:

./main

Following error appears:

Error: 500 main.cu@30
CUDA_ERROR_NOT_FOUND

In main.cu, I use cuModuleGetFunction twice

line 29: load kernel childKernel works fine
line 30: load kernel parentKernel throws error CUDA_ERROR_NOT_FOUND

Expected:
Both parentKernel and its child kernel should be loaded.

Actual Result:
Only the child kernel appears to be loaded, while parent kernel not.

Please let me know if more information is needed.
Thanks in advance for any help!

Robert_Crovella · May 8, 2025, 3:38pm

https://stackoverflow.com/questions/69133608/error-compiling-cuda-dynamic-parallelism-code-with-driver-api

chyan_lea · May 9, 2025, 1:34am

Thank you so much for the help! Your suggestion solved my problem and I also learned a lot in the process—thanks again!

Topic		Replies	Views
dynamic parallelism with cuda driver api CUDA Programming and Performance	6	1958	January 7, 2015
Dynamic Parallelism : code: 30, reason: unknown error from cudaMalloc and cudaMemcpy CUDA Programming and Performance	1	1789	July 14, 2015
Linker error building CUDA example file for dynamic parallelism CUDA Setup and Installation	5	5699	July 21, 2017
Seperate compilation of cuda fortran code concerning dynamic library nvc, nvc++ and nvfortran cuda , kernel	2	777	August 23, 2022
Calling a child kernel from a parent kernel doesn't work CUDA Setup and Installation	0	802	December 31, 2013
When calling a kernel from within a kernel, I get undefined symbol: __fatbinwrap_f6e73cba_22_cuda_device_runtime_cu_945c48ec_33040 CUDA NVCC Compiler	14	134	December 24, 2025
CUDA runtime multi-architecture cubin loading CUDA Programming and Performance	5	1548	November 2, 2016
nvlink error when compiling CUDA code in linux Announcements	0	1451	February 15, 2019
CUDA Dynamic Parallelism API and Principles Technical Blog	19	1109	May 2, 2017
Compiling Dynamic parallelism error CUDA Programming and Performance	1	2644	November 2, 2013

Error loading dynamic parallelism kernel from fatbin via CUDA driver api

Related topics