I want to use constant memory, but I found a problem.
If I define a constant symbol in the cu file and want to initialize it in cpp, I have to pass its size and ptr through the function.
such as:
Yes. The __constant__ syntax is not something that can be processed by the host compiler. When you have a file in your project that is named filename .cpp, that file will be processed by the host compiler, by default. The host compiler will not understand __constant__ syntax.
If you want to switch to using a main.cu file, you can use the extern declaration in order to access the constant symbol in another compilation unit. For this you will need to compile with relocatable device code with device linking. There are examples like this already published.
I haven’t tried using a constant symbol with templating. At first glance, I don’t think it will work. (A simple test suggests to me it will not work: cudaErrorInvalidSymbol).
It is not clear to me what exactly you are trying to achieve, but an alternative to consider may be passing the desired data as a kernel argument, which is placed in constant memory. With recent hardware and software, CUDA is able to pass a good-sized chunk of data this way. This avoids separate cudaMemCpyFromSymbol() calls.
CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit from 4,096 bytes to 32,764 bytes on all device architectures including NVIDIA Volta and above.
In fact, I want to encapsulate a cpp class for Cuda’s constant memory to implement the functions of upload, download, and memset. But I have to export its device pointer in the Cu file that defines the constant memory. If there are too many constant variables, this will require to be implemented get_pointer() for each variable.
I do not know your use case, but off-hand I fail to see the benefits of such an arrangement.
Note that there is not the constant memory, but multiple banks of constant memory (the details differ by GPU architecture), one of which is used to back __constant__.
You could always aggregate multiple variables into a single data object, a struct in the simplest case, to reduce the the number of distinct data objects in constant memory.