Hello guys,
I ran into something strange, which I think is a bug. Here a minimal example:
#include <cstdio>
template <typename T>
__global__ void kernel(T* array) {}
template<typename T>
class Array {
T* data;
public:
Array(size_t n) {
#ifndef __CUDA_ARCH__
// Work with other undefined macro
// #ifndef __XXXUISTISTUISTUSI__
printf("yo\n");
kernel<<<1,1>>>(data);
#endif
}
};
int main() {
Array<int> arr(1);
return 0;
}
The error obtained using cuda-memcheck is “Program hit cudaErrorInvalidDeviceFunction (error 98) due to “invalid device function” on CUDA API call to cudaLaunchKernel.”.
If you change the define to use anything else than __CUDA_ARCH__
, it will work…
This minimal code doesn’t make too much sense here, but I’m using __CUDA_ARCH__
to distinguish between host and device code in __device__ __host__
functions.
Other info:
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
If you think it’s a bug too, I’ll report it.
Cheers!