Shared memory doubt

MathewPotter · June 11, 2008, 10:14am

Suppose I have a kernel (whose header is given below) which is written below, Now How should I effectively use shared memory to gain further performance.

Right now, I am accessing the elements using global memory.

func<<625,256>>(float* outptr, float* arguments).

. and my output array size being 1,6,0000 element.

Now In my application each array element is processed by a single thread index in random fashon. ( thread 1 is processing array 1, or thread 2 is processing say array 3 and so on. As I am using global memory, which this process takes a huge amount of time.

How can I keep my elements in shared memory and gain performance.

Kindly advice.

Mathew Potter.

eirik · June 11, 2008, 12:30pm

As long as you access global memory once to read the value and once to write it I do not think much optimization can be done. When you access an element various times it is advisable to store the value in a local variable in the kernel thus having this variable stored either in registers or shared memory.

kristleifur · June 11, 2008, 12:55pm

True, but

– If the global memory access is uncoalesced, you can use shared memory to build a sort of coalescing buffer.

MathewPotter · June 11, 2008, 4:14pm

What do you mean by coalescing buffer. I came across this in the NVIDIA Programming guide. But could not understand much

Can I build a coalescing buffer in the following kernel.

__global__ void VectorAdditionKernel(const float* pVectorA, const float* pVectorB, float* pVectorC) 

{

  unsigned inti= blockIdx.x* blockDim.x+ threadIdx.x;

  pVectorC[i] = pVectorA[i] + pVectorB[i];

}

Please advice.

Mathew Potter

kristleifur · June 11, 2008, 5:17pm

Coalescing is a very important concept in CUDA - it is essential to getting any real speed. If you’re not familiar with it, check the manual + the forums.

MisterAnderson42 · June 11, 2008, 6:28pm

If you cannot coalesce your reads, read using tex1Dfetch. You can bind the device memory block you already have to a texture.

Topic		Replies	Views
Coalescing memory accesses Need help with coalescing CUDA Programming and Performance	2	1169	March 30, 2009
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	5980	November 27, 2010
Memory, Structs, arrays, etc... CUDA Programming and Performance	0	2286	October 1, 2009
performance for global and shared memory CUDA Programming and Performance	2	6235	January 15, 2008
about shared memory's contribution to performance when global memory access is coalesced CUDA Programming and Performance	0	601	July 12, 2011
Speeding up memory writes CUDA Programming and Performance	5	3220	July 3, 2008
temporary memory issues CUDA Programming and Performance	11	5343	March 30, 2008
Correct Use of Shared Memory? CUDA Programming and Performance	1	715	January 6, 2010
Global memory optimisation CUDA Programming and Performance	6	881	February 4, 2015
Disappointing shared memory performance CUDA Programming and Performance	3	754	September 8, 2011

Shared memory doubt

Related topics