darot
January 22, 2009, 1:58am
1
Dear all:
After my testing, even make all L/UL of global mem are coalesced memoy access when using share memoy ,
The transfer time is still there.(becasue I have to do syncthread after read like code below).
__shared__ float share[BLOCK_SIZESglLRX][BLOCK_SIZESglLRY];
unsigned int xIndex,yIndex,index_in;
xIndex = blockIdx.x * BLOCK_SIZESglLRX + threadIdx.x;-blockIdx.x*Pitch;
yIndex = blockIdx.y * BLOCK_SIZESglLRY + threadIdx.y;
index_in = yIndex * devImgSizeX + xIndex;
if (xIndex<devImgSizeX && yIndex<devImgSizeY){
share[threadIdx.x][threadIdx.y] = *(S+index_in);
}
__syncthreads();
If I want to make a 2d convolution process(not seperable), like smooth or laplacian, I think to use texture memory will be faster then use share memoy.
Is it right?
Dear all:
After my testing, even make all L/UL of global mem are coalesced memoy access when using share memoy ,
The transfer time is still there.(becasue I have to do syncthread after read like code below).
__shared__ float share[BLOCK_SIZESglLRX][BLOCK_SIZESglLRY];
unsigned int xIndex,yIndex,index_in;
xIndex = blockIdx.x * BLOCK_SIZESglLRX + threadIdx.x;-blockIdx.x*Pitch;
yIndex = blockIdx.y * BLOCK_SIZESglLRY + threadIdx.y;
index_in = yIndex * devImgSizeX + xIndex;
if (xIndex<devImgSizeX && yIndex<devImgSizeY){
share[threadIdx.x][threadIdx.y] = *(S+index_in);
}
__syncthreads();
If I want to make a 2d convolution process(not seperable), like smooth or laplacian, I think to use texture memory will be faster then use share memoy.
Is it right?
I’m not sure if is faster but I know that shared memory is very very fast. Are you interested on computer vision? Do you know where I can find information about it and cuda??