Cannot peek at last error after a call to a dlsym()-ed function

Greetings,

I’m looking for a piece of advice. I have a regular host function that invokes a simple CUDA kernel. When I build the source code targeting an architecture more recent than my card’s, the kernel fails to launch with error 209 (cudaErrorNoKernelImageForDevice). So far, so good. (see snippet below)

__global__ void noop() { }

extern "C" void entry_point() {
    noop<<<1, 1>>>();
    printf("Last error: %d\n", cudaPeekAtLastError());
}

If I build that same file as a shared library, dlopen+dlsym the host function and execute it, the very same error code is printed after the unsuccessful call to the kernel, as expected:

...
void *shlib = dlopen("libcuda_error.so", RTLD_NOW);
void (*fn)(void) = (void (*)()) dlsym(shlib, "entry_point");
fn();

Now, here’s the issue. If I call cudaPeekAtLastError() right after the call to fn(), it gives me back 0 – as if no errors happened in the execution of that kernel. I’ve been scratching my head trying to understand what’s going on, but so far I couldn’t find a reasonable explanation. Could someone shed some light here?

I’m attaching a Python script that reproduces the problem (assuming your card does not belong to the compute_80 family). Any piece of advice is very welcome.

Thanks!

cuda_error.py (1.2 KB)

Please post code inline. It facilitates forum discussion, as well as searching.

@Robert_Crovella oh, absolutely. Here it is.

Below you find the CUDA file that compiles to a shared library with nvcc -Xcompiler -fPIC -shared -gencode arch=compute_80,code=sm_80 cuda_error.cu -o libcuda_error.so. The architecture mismatch error is expected because my laptop comes with a GeForce MX250 (Pascal architecture, compute 6.1). For completeness, I’m running CUDA 11.4 on Linux .

#include <stdio.h>

__global__ void noop() { }

extern "C" void entry_point() {
    noop<<<1, 1>>>();
    printf("SharedLib: last error = %d\n", cudaPeekAtLastError());
}

And this is the main C file that loads that shared library and that fails to peek at the last error. I’m building it with nvcc cuda_error.c -o cuda_error -ldl:

#include <stdio.h>
#include <dlfcn.h>
#include <cuda_runtime.h>

void main() {
    void *so = dlopen("libcuda_error.so", RTLD_NOW);
    void (*fn)(void) = (void (*)()) dlsym(so, "entry_point");
    fn();
    printf("Caller: last error = %d\n", cudaPeekAtLastError());
}

The output I get with ./cuda_error is:

SharedLib: last error = 209
Caller: last error = 0

Thanks again!
Lucas

I don’t think I am able to offer anything very constructive at this time, or explain exactly what is happening. You may wish to file a bug.

1 Like

Thanks @Robert_Crovella, I’ll follow your suggestion.