Problem with shared memory

fredgentil · October 23, 2015, 1:50pm

I am new to CUDA and am trying to use shared memory in matrix multiplication and the result comes out always 0 (zero). When using global memory the result is correct. My graphic board is NVS3100.

The values of SIZE and Width = 16 - TILE_DIM = 4.

With SHARED MEMORY - Problem
global void MatrixMul(int *Md, int *Nd, int *Pd, int Width) {

int bx = blockIdx.x; 
int by = blockIdx.y;
int tx = threadIdx.x;
int ty = threadIdx.y;

__shared__ int Ad;
__shared__ int Bd;

int Row = by*TILE_DIM + ty;
int Col = bx*TILE_DIM + tx;

int PValue = 0;

for (int k = 0; k < Width / 2; k++) {
	Ad[ty][tx] = Md[Row*Width + (k*Width + tx)];
	Bd[ty][tx] = Nd[(k*Width + ty)*Width + Col];
	__syncthreads();
}

for (int k = 0; k<Width; ++k){
	PValue += Ad[tx][k] * Bd[k][ty];
	__syncthreads();
}
Pd[Row*Width+Col] = PValue;

}

With GLOBAL MEMORY - OK
global void MatrixMul(int *Md, int *Nd, int *Pd, int Width) {
int bx = blockIdx.x; int by = blockIdx.y;
int tx = threadIdx.x; int ty = threadIdx.y;

int Row = by*TILE_DIM + ty;
int Col = bx*TILE_DIM + tx;

int PValue = 0;

for (int k = 0; k < Width; ++k) {
	PValue += Md[Row*Width + k] * Nd[k*Width + Col];
}
__syncthreads();

Pd[Row*Width+Col] = PValue;

}

Helpme, Plaese!!!

little_jimmy · October 23, 2015, 2:18pm

perhaps you are out of shared memory
what cc is NVS3100?

see if the kernel even runs - either with proper error checking, or by adding a breakpoint on the first kernel line, and using the debugger

or, perhaps your indices are wrong, when using shared memory
easiest seems to simply add a breakpoint to

PValue += Ad[tx][k] * Bd[k][ty];

and noting whether the value actually increments, for different threads

fredgentil · October 23, 2015, 2:54pm

The breakpoint inside the Kernel does not stop the execution … :-((

little_jimmy · October 23, 2015, 3:07pm

add a breakpoint after the kernel launch, in the host code

if that breakpoint is hit, and the debugger does not jump to the breakpoint in the kernel, it is safe to assume the kernel launch failed
you can then verify that via proper error checking - cudaGetLastError(), etc

Robert_Crovella · October 23, 2015, 3:59pm

As little jimmy has indicated, you ought to be doing proper cuda error checking, before asking others for help. Not sure what that is? google “proper cuda error checking” and take the first hit.

You can also try running your code as-is with cuda-memcheck.

fredgentil · October 23, 2015, 4:52pm

I’m using error handling (gpuErrchk) in allocations and transfers Host-> Devide and Device-> Host and the error does not occur. I think the error is in the transfer to the shared memory.

Robert_Crovella · October 23, 2015, 7:53pm

The error checking on allocations and transfers is not sufficient to catch all possible issues. Perhaps you should actually look at proper cuda error checking, and implement it.

Perhaps you should provide a complete code, rather than just the kernel.

Perhaps you should run your code with cuda-memcheck.

Topic		Replies	Views
Shared memory error CUDA Programming and Performance	1	951	June 24, 2012
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	5036	December 23, 2009
Some help needed with shared memory and program correctness matrix * vector operation CUDA Programming and Performance	1	1168	November 30, 2008
error in the result of using shared memory CUDA Programming and Performance	2	617	May 29, 2015
Shared Memory Application Matrix Multipication Using Shared Memory CUDA Programming and Performance	3	1763	September 1, 2009
nVidia CUDA Programming Guide and shared memory CUDA Programming and Performance	0	1492	January 12, 2010
Garbage Value Matrix multiplication using shared memory CUDA Programming and Performance	0	4645	September 25, 2009
Unexpected behaviour of matrix multiply demo CUDA Programming and Performance	7	6121	November 11, 2010
using shared memory confused CUDA Programming and Performance	1	3730	December 4, 2010
multiplication of matrix using shared memory problem of multiplication CUDA Programming and Performance	2	3991	September 30, 2010

Problem with shared memory

Related topics