constant vs shared memory

Archer · February 22, 2007, 8:51am

Hi,

question 1:

According to the programming guide, the shared memory is more efficient (2 clock cycles to issue a read or write operation) if there are no bank conflicts. As to the constant memory, it’s said they are catched. My first question is how many clock cycles are needed to issue a read operation on constant memory?

question 2:

In the kernel part, all threads in one block will access (read only) one shared array ‘AS[1000]’

__global__ void

testKernel( float* g_idata1, float* g_idata2, float* g_odata)

{

      __shared__ float AS[1000];

     //To load data into the shared memory

      for (int i = 0; i < 1000; ++i)

      {

           AS[i] = g_idata1[i];

      }

     for (int i = 0; i < 1000; ++i)

      {

           for (int j = 0; j < 1000; ++j)

           {

                g_odata[j + i * 1000] = max(AS[i], g_idata2[j]);

           }

      }

      ......

}

Since I use 512 threads per block, it seems there will be bank conflicts when accessing the shared array AS[1000]. Am I right? Because only read operation is needed to access AS[1000], I want to know if it will be more efficient to use constant array instead of shared memory to store As[1000] in my case. If it is, how should I use the constant memory? I want to declare and assign one constant array in the host like the following:

extern __constant__ float AS[1000] = {...};

And in the kernel code, I access it:

extern __constant__ float AS[1000];

__global__ void

testKernel(float* g_idata, float* g_odata)

{

      for (int i = 0; i < 1000; ++i)

      {

           for (int j = 0; j < 1000; ++j)

           {

                g_odata[j + i * 1000] = max(AS[i], g_idata[j]);

           }

      }

      ......

}

Because I can only work under the emulation mode now. I don’t know whether it is the correct way to use the constant memory. Thanks.

Archer · February 23, 2007, 12:37pm

Can anyone tell me whether the above way to use constan variable is correct or not? Must I load values into the constant memory using ‘cudaMemcpyToSymbol’?

tachyon_john · February 23, 2007, 4:01pm

I’m using cudaMemcpyToSymbol() to populate/update my constant area. I got excellent performance using the constant buffer for the case where all of the threads read the same constant elements at the same time. If I understand the progamming guide correctly, for that specific case, the speed is the same as reading from a register…

John

Topic		Replies	Views
Copy data into shared memory CUDA Programming and Performance	6	1344	May 28, 2009
shared vs. const memory very simple question regarding performance CUDA Programming and Performance	1	2966	October 1, 2011
const vs shared speed CUDA Programming and Performance	2	4513	August 30, 2007
Can I allocate constant memory dynamically? CUDA Programming and Performance	5	1768	June 24, 2009
Whether use shared memory? CUDA Programming and Performance	8	4480	April 15, 2008
constant cache no faster than global mem? constant memory access CUDA Programming and Performance	5	5697	March 6, 2008
Really slow constant memory Random access to constant memory CUDA Programming and Performance	13	4360	December 4, 2009
__constant__ use CUDA Programming and Performance	6	15141	June 14, 2008
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	5959	November 27, 2010
Shared memory using structure instead of array CUDA Programming and Performance	7	1303	February 29, 2020

constant vs shared memory

Related topics