Am I using the constant memory properly?

I read several old topics pretending one can’t dynamically allocate constant memory, even from the host, as this guy wants to do the same thing that I do ->http://stackoverflow.com/questions/271273/dynamic-allocation-of-constant-memory-in-cuda

The typical answer is that one should texture memory instead. Anyway, I ran the following test:

__constant__ int* c;
__global__ void kernel()
{
	/* UPDATE: ok, the following works, suggesting it's not constant memory... */
	c[4] = 666;

	for (int i = 0; i < 10; ++i)
	{
		printf("%d\n", c[i]);
	}
}

int main()
{
	int host_c[10] = {0,1,2,3,4,5,6,7,8,9};
	int* dev_c;

	cudaMalloc(&dev_c, 10 * sizeof(int));
	cudaMemcpy(dev_c, host_c, 10 * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpyToSymbol(c, &dev_c, sizeof(int*));
	
	kernel<<<1,1>>>();
}

This seems to work, as I get no error and my 10 numbers are correctly printed.

I know from the book “Programming massively parallel processors” that the constant memory is just global memory that is subsequently cached.

My questions are:

  • Just because the "c" pointer is in constant memory, is the area of global memory that c end up pointing automatically considered as "constant memory" by the device? Or am I naive and doing it wrong?
  • Update: Sorry but I just thought now about this basic test answering this first question: I can modify the data pointed by c. This suggest these data are not considered as constant memory! Ok but my next question still worth answering…

  • Is it really the same to use texture memory instead of constant memory? If it is, why did they invent this "constant memory"?

I need to load data from a file. Those data may use more that 65535 bytes, so I actually need an dynamically allocated vector of dynamically allocated constant memory banks to hold those data. But once those data are on the device, they won’t change, ever…

Update: Sorry but I just thought now about this basic test answering my first question: I can modify the data pointed by c. This suggest these data are not considered as constant memory! Ok but my second question still worth answering…

May be have a look at previous discussions about “constant” memory
https://devtalk.nvidia.com/default/topic/481176/function-parameter-vs-constant-memory/
Bill

Thank you for your answer, Bill.
I guess you suggested this discussion mainly because of its last answer:

This is definitely an important issue to me, as each of my threads has to access a different float from a vector…

To be precise, each of my thread just transfer a float from a vector currently held in global memory (and which will never be modified) to another vector also in global memory, without doing any computation! It’s just an initalization operation of a vector for further computations by other threads. I was wondering if I could speed up at least the reading by using constant memory. Apparently not.
I definitely need more information about concurent reading and writing in global memory.
I just begun to read this -> https://developer.nvidia.com/content/how-access-global-memory-efficiently-cuda-cc-kernels

Update: I just checked, I could even skip the writing to initialize the other vector in global memory, but I definitely need the fastest concurent reading achievable.
Sorry, I didn’t write the original non parallel code…

Anyway, my second question still worth answering: why did they invent constant memory if you can do at least the same with texture memory?

Constant memory is a convenience for the programmer. In the simplest case it’s just adding constant and one cudaMemcpyToSymbol() call.

Textures usually require more setup, but they are not limited to 64kb page sizes and they also require reads through tex…1D/2D/3D() calls. Also they support interpolation, coordinate clamping and other goodies.