Printf: invalid device function (Linux)

While porting my program to Linux I’ve discovered what I believed to be a bug in CUDA.
A simple kernel with a single print statement results in a invalid device function error on Linux while it works fine on Windows.
The cause seems to be the creation of a cudaStream_t somewhere else in the program from the constructor of a global variable.

I’ve been able to create this small reproducible program consisting of two files:
fileA.cpp

#include <cstdio>
#include <cuda.h>
#include <cuda_runtime.h>

struct Foo {
    cudaStream_t x;
    Foo()
    {
        cudaStreamCreate(&x);
    }
    ~Foo()
    {
        cudaStreamDestroy(x);
    }
};
Foo bar;

void runHelloWorld();

int main() {
    runHelloWorld();
}

fileB.cpp

#include <cstdio>
#include <cuda.h>
#include <cuda_runtime.h>

void __cudaCheckError(const char* file, unsigned line) {
    cudaDeviceSynchronize();
    cudaError_t err = cudaGetLastError();
    if (cudaSuccess != err) {
        std::fprintf(stderr, "ERROR %s:%d: cuda error \"%s\"\n", file, line, cudaGetErrorString(err));
        std::abort();
    }
}
#define CUDA_CHECK_ERROR() __cudaCheckError(__FILE__, __LINE__)
    

__global__ void helloWorld() {
	printf("Hello World!\n");
}

void runHelloWorld() {
	CUDA_CHECK_ERROR(); // Make sure that no error occurred before the kernel invocation
	helloWorld<<<1, 1>>>();
	CUDA_CHECK_ERROR(); // <=== invalid device function error found here.
	cudaDeviceSynchronize();
}

Merging the code into a single (CUDA) file fixes the error as well as creating the stream from within main() instead of a global variable.

System information:

Operating System: PopOS 22.04
Driver: 515.48.07
CUDA Toolkit: 11.7

  • if it were me, I would actually name fileB.cpp as fileB.cu Ordinarily, CUDA specific constructs don’t belong in .cpp files. Yes, I am aware you can work around this via compile switches, but you haven’t shown the compilation command, and the world + dog is generally expecting nvcc ... not nvcc -x cu .... Its a nitpick.

  • CUDA runtime API functions should not be called outside of main scope. This is due to CUDA lazy initialization.

If I modify your fileA.cpp file as follows:

#include <cstdio>
#include <cuda.h>
#include <cuda_runtime.h>

struct Foo {
    cudaStream_t x;
    Foo()
    {
        cudaStreamCreate(&x);
    }
    ~Foo()
    {
        cudaStreamDestroy(x);
    }
};
//Foo bar;

void runHelloWorld();

int main() {
    Foo bar;
    runHelloWorld();
}

then the error goes away, for me. Yes, I understand you said it works on windows. The nature of lazy initialization is such that behavior can vary. To be safe, you should avoid calling CUDA API functions outside of main scope. A writeup discussing some mechanics of this can be found here. (I’m not suggesting that article gives a precise description of what is happening here; it does not. I’m suggesting that it helps to illuminate a bit what lazy initialization involves.)

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.