access speed of shared memory and global memory

Cheng_ed · August 6, 2009, 12:14pm

Hi everyone,
I’ve just made a test about the access speed of the shared memory and global memory, the result makes me very surprised, the following is the cuda kernel, in the host, i defined one block and one thread in it.

kernel1: fetch data from global memory and send back to the host
global fun(char data, char result)
{
int bid=blockIdx.x+blockIdx.ydimGrid.x;
int tid=threadIdx.x+threadIdx.ydimGrid.x;
int index=0;
char block;
while(index<10000){
block=data[tid];
result[tid]=block;
index++;}
}
kernel2: fetch data from shared memory and send back to the host
global fun(char data, char result)
{
int bid=blockIdx.x+blockIdx.ydimGrid.x;
int tid=threadIdx.x+threadIdx.ydimGrid.x;
int index=0;
char block;
shared char sub[1]; sub[0]=‘K’;
while(index<10000){
block=sub[0];
result[tid]=block;
index++;}
}
why kernel1 is faster than kernel2??? Anybody can give me supports?

YDD · August 6, 2009, 1:14pm

I can’t speak to the specifics of the example, but if you only had one block and one thread, you’re pretty much missing the point of the GPU. Shared memory is there to ensure that threads within a block can collaborate… one thread per block means no collaboration.

Topic		Replies	Views
about shared memory's contribution to performance when global memory access is coalesced CUDA Programming and Performance	0	597	July 12, 2011
Device memory VS Shared memory CUDA Programming and Performance	4	4109	September 22, 2008
Shared memory vs global memory CUDA Programming and Performance	6	3442	April 30, 2007
about shared memory's contribution to performance when global memory access is coalesced CUDA Programming and Performance	3	3513	July 12, 2011
Correct Use of Shared Memory? CUDA Programming and Performance	1	712	January 6, 2010
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7552	July 21, 2008
Shared memory doubt CUDA Programming and Performance	5	4595	June 11, 2008
__shared__ memory confused me. __shared__ memory CUDA Programming and Performance	7	3995	August 1, 2009
why is shared memory example not faster CUDA Programming and Performance	1	1096	April 23, 2012
Block dim discussion 1D vs 2D CUDA Programming and Performance	8	8344	August 14, 2007

access speed of shared memory and global memory

Related topics