Matrix Mult Result is zero!

faramarz · July 11, 2010, 6:56am

Dear Friends,

I have a code for Very Simple Matrix Multiplication on CUDA.
All things are OK, but results return 0. I had check the time for different dimension sizes and its OK.

I need your help.

Best Regards

my code is below:

// includes
#include <cutil_inline.h>
#include <shrUtils.h>
#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>

// defines, project
const int N = 1024;
const int blocksize = 32; //Max Size of HW block is 32, then values greater than 32 don’t effect in processing time.

global
void mul_matrix( float* a, float b, float c, int N )
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int index = i + jN;
c[index] = 0;
if ( i < N && j < N )
for (int k=0; k<N; k++)
c[index] += a[jN+k] * b[i+k*N];
}

int main() {
float a = new float[NN];
float b = new float[NN];
float c = new float[NN];

for ( int i = 0; i < N*N; i++) {	a[i] = 1.0f; 	b[i] = 2.9f; 	c[i] = 3.5f;} // initializing

float *ad, *bd, *cd;
const int size = N*N*sizeof(float);

cudaMalloc( (void**)&ad, size );
cudaMalloc( (void**)&bd, size );
cudaMalloc( (void**)&cd, size );

cudaMemcpy( ad, a, size, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, size, cudaMemcpyHostToDevice );

dim3 dimBlock( blocksize, blocksize );
dim3 dimGrid( N/dimBlock.x, N/dimBlock.y );
mul_matrix<<<dimGrid, dimBlock>>>( ad, bd, cd, N );

cudaMemcpy( c, cd, size, cudaMemcpyDeviceToHost );

for (int k=0; k<3; k++)
{
   printf("a %d = %f\n", k, a[k]);
   printf("b %d = %f\n", k, b[k]);
   printf("c %d = %f\n", k, c[k]);
}

cudaFree( ad ); cudaFree( bd ); cudaFree( cd );
delete[] a; delete[] b; delete[] c;

int d;
scanf("%d",&d);

return EXIT_SUCCESS;

}

Cuda_Libre · July 11, 2010, 8:02am

Hello,

You’re using blocks of size 32*32 = 1024 threads : isn’t that too much for your hardware ? :)

(launch the deviceQuery sample from the SDK to know the maximum number of threads/block your card supports)

faramarz · July 11, 2010, 9:01am

Thank you my friend

It really works OK.

Topic		Replies	Views
32 x 32 Matrix Multiplication CUDA Programming and Performance	2	2871	March 5, 2010
Matrix multiplcation peoblem CUDA Programming and Performance	2	1099	July 9, 2010
Matrix Multiplucation CUDA Programming and Performance	0	662	June 27, 2011
Hello CUDA! program not working - please help CUDA Programming and Performance	2	1224	February 17, 2010
Weird Matrix-Vector Results - Help? CUDA Programming and Performance	2	4930	April 6, 2010
matrix multiplication program CUDA Programming and Performance	0	3400	August 20, 2010
Matrix by vector multiplication CUDA Programming and Performance	4	902	June 16, 2013
nVidia CUDA Programming Guide and shared memory CUDA Programming and Performance	0	1462	January 12, 2010
Matrix Multiplication In CUDA CUDA Programming and Performance	6	2540	May 11, 2015
Matrix multiplication CUDA Programming and Performance	7	2155	July 2, 2010

Matrix Mult Result is zero!

my code is below:

Related topics