Hello! First time posting in this forum so let me know if I am missing anything. I have been trying to find a solution to this issue and haven’t found a match online.
I currently have two files with kernels (kernel1.cu and kernel2.cu) and may add more. The kernels are both called from another cuda file (manager.cu) which handles all of the device allocation and copy in/out. Also, both kernel files share constants, so they both include another file (constants.cuh). However, some of these constants are not known until runtime so I was hoping to use cudaMemcpyToSymbol to define them from the manager. Here is a simplified layout of the code:
constants.cuh
#pragma once
__device__ __constant__ int testconst;
The manager.h and kernel1.h files simply declare the testwrapper and testkernel functions.
When this testwrapper is run, the testconst does not get defined and the printf outputs 0. If I define testconst directly in the constants file, it is read back correctly. It also works if all of these are combined into one file, however then kernel2 cannot reuse this. Any ideas for how to get the copy to work in this sort of layout?
I am confused. There does not seem to be a kernel2 in the posted code? What is manager.h? Is it needed to reproduce your observations?
It would also be useful if you could share the exact command lines used to invoke nvcc (and the linker, if that is run as a separate step). When you run a failing case under control of compute-sanitizer, are any errors reported?
As indicated by njuffa, I consider it good practice to provide complete test cases when asking questions here. The simpler the better.
When using a global symbol in more than one compilation unit in C or C++, you should go through a set of thought processes to make sure your usage is correct.
A __device__ symbol defined in one compilation unit is not the same as the exact same __device__ symbol defined in another compilation unit, unless you take explicit steps to make it so. This will involve:
proper use of the C++ extern keyword
in the case of CUDA __device__ (or, equivalently, __constant__), proper use of relocatable device code with device linking
Thank you for the help and the feedback! My apologies for not including the rest of the files. What Robert provided did solve my problem after going through the resources and applying it to my situation. For those with a similar problem it came down to:
Using extern in the constant file
Defining the constant in the manager.cu file
Using the -rdc=true flag on the nvcc compile and, since I am using cmake, setting the CUDA_SEPARABLE_COMPILATION property to ON