bug in 1 block, 1 thread example.

skchoe · March 11, 2010, 8:46pm

Hi, I have problem running simple code.

The code is really simple. It uses only 1 block, 1 thread. Kernel returns array of values containing 1, 2 resp.

I cannot get the values in host. I want my memory alloc, and args-passing is right.

Any help?

kernel part:

extern "C"

__global__ void cpyTest(int* answer)

{

	answer[0] = 1;

	answer[1] = 2;

}

Part of host code:

CUdeviceptr d_0;

	CU_SAFE_CALL(cuMemAlloc( &d_0, sizeof(int) * 2));

	// Calling kernel

	int sf = sizeof(int);

	CU_SAFE_CALL(cuFuncSetBlockShape(cpyTest, 1, 1, 1));

	CU_SAFE_CALL(cuParamSeti(cpyTest, 0, d_0));

	CU_SAFE_CALL(cuParamSetSize(cpyTest, sf*2));

	CU_SAFE_CALL(cuLaunchGrid(cpyTest, 1, 1));

	int *h_0 = (int*)malloc(sf * 2);

	CU_SAFE_CALL(cuMemcpyDtoH(h_0, d_0, sf * 2));

	printf("answer = %d, %d\n", h_0[0], h_0[1]);

After running, printf wrote:

answer = -1213066928, 134817120

It seems that cuMemcpyDtoH doesn’t copy from device to host. Any idea is appreciated.

S.

skchoe · March 11, 2010, 9:26pm

The simpler version using runtime API answers strange too.

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include <cuda.h>

#include <cutil.h>

#include <math_functions.h>

extern "C"

__global__ void

cpyTest(int* answer)

{

	answer[0] = 1;

	answer[1] = 2;

}

////////////////////////////////////////////////////////////////////////////////

// Program main

////////////////////////////////////////////////////////////////////////////////

int

main(int argc, char** argv)

{

	int sf = sizeof(int);

	int* d_0;

	CUDA_SAFE_CALL(cudaMalloc( (void**) &d_0, sf * 2));

	dim3 threads(1, 1);

	dim3 grids(1, 1);

	// Calling kernel

	cpyTest<<<grids, threads>>>(d_0);

	int *h_0 = (int*)malloc(sf);

	CU_SAFE_CALL(cudaMemcpy(h_0, d_0, sf, cudaMemcpyDeviceToHost));

	printf("answer = %d, %d\n", h_0[0], h_0[1]);

	free(h_0);

	CUT_EXIT(argc, argv);

}

The answer should be 1, 2 but

it prints:

answer = 1, 828337523

Press ENTER to exit…

Thanks,

S.

mfatica · March 11, 2010, 9:44pm

cudaMemcpy(h_0, d_0, sf, cudaMemcpyDeviceToHost)

should be

cudaMemcpy(h_0, d_0, sf*2, cudaMemcpyDeviceToHost

Topic		Replies	Views
cudaMemcpy(..., cudaMemcpyDeviceToHost) not working? CUDA Programming and Performance	1	5074	November 16, 2009
Problem with getting data from blocks CUDA Programming and Performance	3	2722	December 10, 2007
Can't copy device memory to host memory CUDA Programming and Performance	2	3172	June 10, 2009
Array copy cuda program copy array from Host to GPU CUDA Programming and Performance	2	3529	September 17, 2016
cudaMemcpy not working? CUDA Programming and Performance	3	4373	May 27, 2009
Question about CUDA_SAFE_CALL(cudaMemcpy(hostPx, CUDA_SAFE_CALL(cudaMemcpy(hostPx, device CUDA Programming and Performance	6	47557	January 23, 2009
Copying memory from host to device and vice versa didn't work CUDA Programming and Performance	0	670	October 11, 2011
Problem with cudaHostAlloc Problem with Memcpy CUDA Programming and Performance	6	3007	July 2, 2012
Problem with parameter CUDA Programming and Performance	2	632	April 30, 2017
Problems with copying memory from host to device Cuda with Linux (64 bit) CUDA Programming and Performance	1	2787	October 10, 2007

bug in 1 block, 1 thread example.

Related topics