Global vs Shared Memory Grayscale performance same for both the codes.

Cudip · June 4, 2011, 8:43am

Hi gys,

I measured the performance of my grayscale filter that convert input rgb image to grayscale using both shared and global memory. And surprisingly the shared memory code took more time than global memory. Below are the pseudo code’s for the filters. Can anyone point out where i am going wrong? or is it just that using shared memory for the grayscale is a bad decision.

The pseudo code with shared memory:
shared unsigned char sh_Tile_in[16164];
int tx = threadIdx.x + (blockIdx.x * blockDim.x);
int ty = threadIdx.y + (blockIdx.y * blockDim.y);
int offset = tx + ty * blockDim.x*gridDim.x;
int sh_offset = threadIdx.x + threadIdx.y * 16;
/some copy stuff/
if(offset < width * height)
{sh_Tile_in[sh_offset] = 0.3 * (sh_Tile_in[sh_offset * 4 + 0]) + 0.6 * (sh_Tile_in[sh_offset * 4 + 1]) + 0.1 * (sh_Tile_in[sh_offset * 4 + 2]);}
__syncthreads();
gpu_in[offset] = sh_Tile_in[sh_offset];

The code with global memory:
if(offset < width * height)
{
color = 0.3 * gpu_in[offset * 4 + 0] + 0.6 * gpu_in[offset * 4 + 1] + 0.1 * gpu_in[offset * 4 + 0];
gpu_in_4[offset * 4 + 0] = color;
gpu_in_4[offset * 4 + 1] = color;
gpu_in_4[offset * 4 + 2] = color;
gpu_in_4[offset * 4 + 3] = 0;
}

Thanks in advance…

tera · June 4, 2011, 11:16am

Why do you expect the shared memory version to be faster? It does not reuse any data from shared memory, so the number of global memory reads is the same as in the other version, just with more overhead

Topic		Replies	Views
Shared memory as slow as global memory CUDA Programming and Performance	8	4613	September 5, 2016
Why shared memory is slower than global memory with gradient computation? CUDA Programming and Performance	6	5219	November 9, 2009
Roughly the same processing time for global and shared mem CUDA Programming and Performance	9	1931	June 6, 2010
Shared Memory Vs Device Memory Device memory gives better result :fear: CUDA Programming and Performance	3	2801	April 16, 2007
Matrix Multiplication: Shared vs Global Memory CUDA Programming and Performance	1	3731	June 27, 2011
when to use shared memory CUDA Programming and Performance	0	2311	March 10, 2009
Worse atomic performance in shared than global memory CUDA Programming and Performance	7	9194	August 3, 2017
Is it already the mostly optimized version? CUDA Programming and Performance	2	1551	January 22, 2009
access speed of shared memory and global memory CUDA Programming and Performance	1	1113	August 6, 2009
Why is the performance more? Refering to Dr Dobbs article CUDA Programming and Performance	10	2776	April 23, 2010

Global vs Shared Memory Grayscale performance same for both the codes.

Related topics