Should I use constant memory or Texture?

OK, I have two look ups I need to perform in my kernels. First I need to perform some calculations using coefficients in a 3x3 matrix of floats. What is the best way for me to store the 3x3 matrix and should I access it using a texture or using constant memory? My understanding is that since all threads will be accessing common locations of the matrix I should be using constant memory. If that is true, how do I allocate to constant memory?

I also have multiple 1024x768 lookup tables in which each thread will be accessing different locations. These should be used as texture lookups, correct?

I know this topic has been touch multiple times but I just wanted to make sure. Plus, I would like to know how to allocate to constant memory :magic:


If all threads in a warp are accessing the same location in memory simultaneously, constant memory is the best.

Just declare constant memory with the constant attribute:
constant float matrix[3][3]
To populate the values in the matrix, use cudaMemcpyToSymbol

For the 1024x768 lookup table, you will need to use the texture for the simple reason that 1024x768 is larger than 64k by a long shot :) Performance will be decent, but will be much better if nearby threads in a warp access nearby locations in the lookup table.

So are you saying I have to do a memcpy inside my kernel? And is it better to declare as float[3][3] or as float[9]. I will be running this kernel in a loop and the coefficients will not be changing.

Thanks again.

To write to constant memory you need to call cudaMemcpyToSymbol() from host code, not from kernel.

float[3][3] or float[9] shouldn’t make too much of a difference. Use whichever is more natural for your algorithm.

OK, I read the cudaMemcpyToSymbol() description in D.5.18 and am still confused. Could some body give me an example. If I do constant float matrix[3][3] in the host code it will not compile and if I do it in the kernel code I get an execution error of “invalid device symbol”.

I’m trying this host code:



where cL and cR are two different float[9]*

Device code:

__constant__ float dcL[9];

__constant__ float dcR[9];

__global__ void 

cuGenRectMap(float* RectMapXL, float* RectMapYL,float* RectMapXR, float* RectMapYR, int* RES_H)




I feel like I’m not even close to doing this right. I did a search for “constant” in the examples without any results :thumbsdown:

I don`t have my code at hand, but copied it from the programming guide as far as I remember

cudaMemcpyToSymbol() expects ASCII string as first argument:

cudaMemcpyToSymbol( "name_of_variable_in_kernel", cL, 9*sizeof(float) );

That did it! Thanks alot :thumbup: