Strange bug with __CUDA_ARCH__ and kernel template implicit instantiation

ptalbot.gsoc · June 18, 2021, 10:46pm

Hello guys,

I ran into something strange, which I think is a bug. Here a minimal example:

#include <cstdio>

template <typename T>
__global__ void kernel(T* array) {}

template<typename T>
class Array {
  T* data;
public:
  Array(size_t n) {
    #ifndef __CUDA_ARCH__ 
    // Work with other undefined macro
    // #ifndef __XXXUISTISTUISTUSI__
      printf("yo\n");
      kernel<<<1,1>>>(data);
    #endif
  }
};

int main() {
  Array<int> arr(1);
  return 0;
}

The error obtained using cuda-memcheck is “Program hit cudaErrorInvalidDeviceFunction (error 98) due to “invalid device function” on CUDA API call to cudaLaunchKernel.”.

If you change the define to use anything else than __CUDA_ARCH__, it will work…

This minimal code doesn’t make too much sense here, but I’m using __CUDA_ARCH__ to distinguish between host and device code in __device__ __host__ functions.

Other info:

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |

If you think it’s a bug too, I’ll report it.

Cheers!

Topic		Replies	Views
(error 98) due to "invalid device function" for a very simple templated kernel example CUDA Programming and Performance cuda , kernel	3	3541	July 8, 2020
cudaErrorInvalidDeviceFunction when running google typed test with CUDA kernel CUDA Programming and Performance	0	456	October 1, 2021
[CUDA 4.0] : __CUDA_ARCH__ undefined in device code CUDA Programming and Performance	9	6753	July 14, 2011
Is __CUDA_ARCH__ broken? CUDA Programming and Performance	3	12759	June 10, 2011
__CUDA_ARCH__ undefined?! CUDA Programming and Performance	10	20576	April 9, 2012
InvalidDeviceFunction error when launching templated global function CUDA Programming and Performance cuda , kernel , nvbugs	1	456	November 19, 2022
Kernel Launch Failure Very simple kernel CUDA Programming and Performance	3	3893	September 14, 2011
cudaErrorInvalidDeviceFunction Simple program throwing cudaErrorInvalidDeviceFunction error CUDA Programming and Performance	1	2513	April 24, 2010
Templated kernels and printf CUDA Programming and Performance	5	9566	December 20, 2008
Bug: `__device__` calls from `__host__` functions not detected with templates CUDA NVCC Compiler	0	331	September 4, 2023

Strange bug with __CUDA_ARCH__ and kernel template implicit instantiation

Related topics