__cudaLaunch static pointer causes data race warning

Hi forum,

We’ve been bugged with issue for a long time:
nvcc generates a cudafe stub.c that invokes the macro __cudaLaunch, which in itself defines a volatile static char* __f marked with unused attribute, and has its value set in the same line. That’s it. The __f is never used within this macro and IIUC, the macro defines a scope that confines the visibility of this __f.

However, the static keyword is occasionally giving us tsan warnings (WAW) and it’s quite easy to see why: two threads calling the same kernel will write the same value to the same pointer and there’s no guarantee which will write last, although the values written to it are the same, tsan still is quite unhappy.

We’re thinking of patching this line from targets/x86_64-linux/include/crt/host_runtime.h by removing the static. But we fear the side effect that we might not know.

Q:

  1. What is the purpose of this __f.
  2. If we patch the __f to be volatile char* __f, is there going to be an issue?

Mengda

/* the use of __args_idx in the expression below avoids host compiler warning about it being an
   unused variable when the launch has no arguments */
#define __cudaLaunch(fun) \
        { volatile static char *__f __NV_ATTR_UNUSED_FOR_LAUNCH;  __f = fun; \
          dim3 __gridDim, __blockDim;\
          size_t __sharedMem; \
          cudaStream_t __stream; \
          if (__cudaPopCallConfiguration(&__gridDim, &__blockDim, &__sharedMem, &__stream) != cudaSuccess) \
            return; \
          if (__args_idx == 0) {\
            (void)cudaLaunchKernel(fun, __gridDim, __blockDim, &__args_arr[__args_idx], __sharedMem, __stream);\
          } else { \
            (void)cudaLaunchKernel(fun, __gridDim, __blockDim, &__args_arr[0], __sharedMem, __stream);\
          }\
        }
$ /usr/local/cuda-11.2/bin/nvcc --version                                
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Jan_28_19:32:09_PST_2021
Cuda compilation tools, release 11.2, V11.2.142
Build cuda_11.2.r11.2/compiler.29558016_0

Gpu: Quadro RTX 5000
Cpu: x86_64, i9-9900K
Ubuntu 18.04