Hello CUDA! program not working - please help

alan48 · February 17, 2010, 6:53pm

Hi–

I’m trying to learn CUDA and my simple ‘hello world’ / ‘hello cuda’ program isn’t working.

I’m just trying to multiply two matrices together, where the matrix can be defined across several blocks.

According to the output the matrix that comes back is zero-filled (but should have non-zero numbers everywhere).

C:\CUDA\Projects\MatrixMultiply\x64\Release>MatrixMultiply.exe
CUDA initialized.
M =
1.000000 1.000000 1.000000 1.000000
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
4.000000 4.000000 4.000000 4.000000

N =
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
4.000000 4.000000 4.000000 4.000000
5.000000 5.000000 5.000000 5.000000

hostresult =
0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000

Press ENTER to exit…

My kernel:
global void MatrixMulKernel(float* Md, float* Nd, float* Pd, int width, int tile_width)
{
int row = blockIdx.y * tile_width + threadIdx.y;
int col = blockIdx.x * tile_width + threadIdx.x;

float Pvalue = 0;

int k;

for (k = 0; k < width; ++k)
	Pvalue += Md[row * width + k] * Nd[k * width + col];

Pd[row * width + col] = Pvalue;

}

And this function launches the kernel:
void MatrixMulOnDevice(float* M, float* N, float* P, int width, int tile_width)
{
int size = width * width * sizeof(float);
float* Md; float* Nd; float* Pd;

//allocate Md, Nd on device & copy host-generated values

cudaMalloc((void**)&Md, size);
cudaMemcpy((void**)&Md, M, size, cudaMemcpyHostToDevice);

cudaMalloc((void**)&Nd, size);
cudaMemcpy((void**)&Nd, N, size, cudaMemcpyHostToDevice);

//allocate Pd on device
cudaMalloc((void**)&Pd, size);

//declaring 2 threads PER BLOCK
dim3 dimBlock(2,2);

//declaring 2 tiles (1 tile = 1 block) PER GRID
dim3 dimGrid(tile_width, tile_width);

MatrixMulKernel<<<dimGrid, dimBlock>>>(Md,Nd,Pd,width,tile_width);

cudaFree(Md);
cudaFree(Nd);
cudaFree(Pd);

}

Full code also attached. Any help is greatly appreciated

Thanks
-Alan

downforme · February 17, 2010, 8:26pm

I cant find any attached code, but did you remember to copy the result back?

In your function, you call the kernel and free all the device memory afterwards.

You need to synchronice(?) and call sth. like cudaMemcpy((void**)&P, Pd, size, cudaMemcpyDeviceToHost); before freeing.

alan48 · February 17, 2010, 9:05pm

I solved it–I mistakenly dereferenced what was already an address…sigh…

Sorry about not posting the full code.

Thanks for your response though

-Alan

Topic		Replies	Views
Matrix multiplcation peoblem CUDA Programming and Performance	2	1143	July 9, 2010
Matrix by vector multiplication CUDA Programming and Performance	4	972	June 16, 2013
CUDA Kernel seems not to be excecuted CUDA Programming and Performance	1	796	July 11, 2009
I am new to cuda programming. In this code, c matric return by GPU is Zero matrix. I tried different... CUDA Programming and Performance	0	464	July 3, 2018
mutrix multiplication CUDA Programming and Performance	4	2200	November 20, 2011
Working with matrix CUDA Programming and Performance	0	956	May 25, 2009
Matrix Mult Result is zero! CUDA Programming and Performance	2	1079	July 11, 2010
matrix multiplication--wrong answer CUDA Programming and Performance	6	3876	August 20, 2009
Matrix multiplication from CUDA programming guide CUDA Programming and Performance	0	1872	November 23, 2009
Intro CUDA - Matrix Multiplication Returning Odd Values CUDA Programming and Performance	1	5737	June 25, 2009

Hello CUDA! program not working - please help

Related topics