The OpenCL spec defines three different features which in various ways seem to resemble CUDA’s constant memory. When creating a buffer, you can pass the CL_MEM_READ_ONLY flag. Alternatively, when declaring the input arguments to a kernel, you can mark a pointer as “__constant”, or as “const __global”. The spec is extremely vague on what each of these is supposed to do. What is the actual effect of each of them? Which one or ones most closely correspond to CUDA’s constant memory?
It is my understanding that arguments to a kernel preceded by ‘__constant’ will be placed in CUDA’s constant memory. Declaring it with just ‘const’ along with another address space qualifiers, such as ‘__global’, is only to tell the compiler that the kernel will not change the contents of the memory buffer. The data will remain in the global memory. The specification has the following to say about CL_MEM_READ_ONLY
“This flag specifies that the memory object is a read-only
memory object when used inside a kernel.
Writing to a buffer or image object created with
CL_MEM_READ_ONLY inside a kernel is undefined.”
Not sure what this really means for a particular implementation on a particular device.
Can anyone from Nvidia provide an authoritative answer to this? I already know what the spec says about each of these features. I’m asking how they actually are implemented on Nvidia GPUs. What do I need to do if I want a piece of information to be stored in the GPU’s constant memory pool?
I’ve also been told by someone at AMD that they plan to implement “const __global” to load the data through the texture cache. Will Nvidia do something similar?
Specifying “__constant” is the only way to guarantee data is stored in GPU constant memory.
I don’t think it would be possible to map “const __global” to constant memory in general because of the limited size of the constant cache.
Reading linear memory through texture is also limited to 2^27 elements, so I’m not sure we could read “const __global” through texture in the general case either.
“__constant” is specified in the kernel - that is, at the time you access the data. How does it know to store that data in constant memory rather than global memory in the first place?