cudaMemcpy problem

Fry29June · June 28, 2012, 10:45pm

I’m trying to run a kernel that set each position of an array to the value 7, but I can’t figure out why

the result doesn’t change, I suspect the cudaMemcpy is not working, here is the code:

__global__ void kernel(int * d_A ){

	int idx = blockIdx.x * blockDim.x + threadIdx.x;

	d_A[idx] = 7;

 }

int main(int argc, char * argv[])	{

	 

	int r = 10; // vector dimension 

	srand(time(NULL));		 

	int *  V = (int*) malloc(sizeof(int)*r); 

	for(int j=0;j<r;j++)

			V[j] = rand();

	

	printf("\n PREVIEW\n");

	for(int h=0;h<r;h++) 

			printf("\n %d\n", V[h]);

	

	int *d_A = 0;			

	

	cudaMalloc((void**) &d_A, r*sizeof(int));

	cudaMemset(d_A,1,r*sizeof(int));

	

	dim3 dimBlock(10*sizeof(int));

	

	dim3 dimGrid(ceil(r/(int)10));

	

	kernel<<<dimGrid,dimBlock>>>(d_A);

	

	cudaMemcpy(V,d_A,r*sizeof(int), cudaMemcpyDeviceToHost);

	 

	printf("\n Output V:\n");

	for(int h=0;h<r;h++)

			printf("\n %d\n", V[h]);

	// output is the same => V has not been modified by the cudaMemcpy

	return 0;

}

tera · June 29, 2012, 1:30am

Your kernel performs out-of-bounds array accesses because you start it with too many threads (you only need 10 threads, not [font=“Courier New”]10*sizeof(int)[/font]). The expression for the grid size also looks quite fragile to me. The common way to express this without any use of floating point arithmetics is font=“Courier New”[/font]. Furthermore your code will fail if the total number of threads is not an integer multiple of the blocksize, because the additional threads from rounding up the block number would also perform out-of-bounds array accesses. This can be prevented by explicitly disabling unneeded threads inside the kernel.

Also have a look at the tips in my signature.

Fry29June · June 29, 2012, 8:13am

Thank you sir, you solved my problem.

Topic		Replies	Views
strange behavior of data size in cudaMalloc or cudaMemcpy CUDA Programming and Performance	2	4912	February 9, 2009
HELP NEEDED! cudamemcpy CUDA Programming and Performance	3	2533	March 18, 2008
threadIdx.y question CUDA Programming and Performance	7	6625	September 21, 2015
cudaMemcpy don't work CUDA Programming and Performance	4	1789	July 3, 2015
My first program with CUDA need some help CUDA Programming and Performance	3	2563	August 10, 2009
Writes to global memory are not visible CUDA Programming and Performance	5	6684	June 4, 2010
Kernel not doing anything CUDA Programming and Performance	8	4590	January 31, 2011
cudaMemcpy error code 4 CUDA Programming and Performance	1	803	June 3, 2018
Thread Synchronisation in parallel array write CUDA Programming and Performance	4	571	April 1, 2017
cudaErrorMemoryCopyFailed ..but I don't use cudaMemcpy at all?! CUDA Programming and Performance	7	9970	February 21, 2007

cudaMemcpy problem

Related topics