error in the result of using shared memory

Manalo · May 29, 2015, 7:14pm

Dear all;
I had written this program to multiply each element of the matrix with a fixed number (2) in shared memory, the program compile and run correct, but the result is not as wanted.
the first row of matrix is just multiplied by , the rest row is the same
how can I solve this problem??

#include "cuda.h"
#include "cuda_runtime.h"

#include "device_launch_parameters.h"
#include <stdio.h>
#include <math.h>

#define N 4



__global__ void calculate_ratios(float *a,float *b)
{
   
	int tx = threadIdx.x;
	int ty = threadIdx.y;

	__shared__ float temp[N][N];

	temp[ty][tx] = a[ty*(N ) + tx];
	__syncthreads();

	
	temp[ty][tx] = temp[ty][tx]*2;
	
	
	__syncthreads();
	b[ty*(N ) + tx] = temp[ty][tx];
	__syncthreads();

	}


int main()
{

	float a_h[N*N] = { 3, 5, 2, 19, 2, 3, 11, 11, 1, 2, 2, 11, 0, 0, 0, 0 };
	
	for (int i = 0; i < N; i++){
		
		for (int j = 0; j < N; j++){
			printf("%.1f ", a_h[i *N + j]);
		
		}
		printf("\n");
	}


	printf("\n");	printf("\n");

	float *a_d;
	float *b_d;
	cudaMalloc((void **)&a_d, N*sizeof(float));
	cudaMalloc((void **)&b_d, N*sizeof(float));

	cudaMemcpy(a_d, a_h, N*sizeof(float), cudaMemcpyHostToDevice);
	dim3 dimBlock(N, N, 1);
	calculate_ratios << <1, dimBlock >> >(a_d,b_d);
   
	 cudaMemcpy(a_h, b_d, N*sizeof(float), cudaMemcpyDeviceToHost);
	 
	 for (int i = 0; i < N; i++){

		 for (int j = 0; j < N; j++){
			 printf("%.1f ", a_h[i *N + j]);

		 }
		 printf("\n");
	 }

	 
	 cudaFree(a_d);
	 cudaFree(b_d);
}

Robert_Crovella · May 29, 2015, 8:37pm

First of all, you should get in the habit of using proper cuda error checking. If you don’t know what that is, google “proper cuda error checking”, and take the first hit, and study it.

Second, running your code with cuda-memcheck may help you to get an idea of where the problem is.

Finally, these allocations cannot be correct:

cudaMalloc((void **)&a_d, N*sizeof(float));
cudaMalloc((void **)&b_d, N*sizeof(float));

You need storage for an NN matrix. Nsizeof(float) is not enough. You need NNsizeof(float).

Manalo · May 29, 2015, 9:23pm

Thank you txbob
I found the problem that was in the size allocation and in the copy operation
it must be N*N not N

Topic		Replies	Views
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	5035	December 23, 2009
multiplication of matrix using shared memory problem of multiplication CUDA Programming and Performance	2	3990	September 30, 2010
Problem with shared memory CUDA Programming and Performance	6	998	October 23, 2015
Some help needed with shared memory and program correctness matrix * vector operation CUDA Programming and Performance	1	1167	November 30, 2008
Shared memory error CUDA Programming and Performance	1	949	June 24, 2012
nVidia CUDA Programming Guide and shared memory CUDA Programming and Performance	0	1490	January 12, 2010
copy from 1D array to shared memory matrix in cuda CUDA Programming and Performance	7	2172	June 9, 2015
Unexpected behaviour of matrix multiply demo CUDA Programming and Performance	7	6114	November 11, 2010
Garbage Value Matrix multiplication using shared memory CUDA Programming and Performance	0	4645	September 25, 2009
strange error about shared memory CUDA Programming and Performance	4	2366	November 30, 2007

error in the result of using shared memory

Related topics