cudaMemcpy() works in emu-mode; release-mode don't

christoph · May 24, 2007, 8:45am

hi all,

i have a problem with cudaMemcpy(). i allcote memory on the host for my result with malloc() [i also tried cuMemAllocSystem()]. i initialize the memory with "0"s and copy it to the device.

// allocate host memory for result

unsigned int* h_resBitVector = NULL;

h_resBitVector = (unsigned int *)malloc( mem_Size_BitVector ); // PAGED

//CU_SAFE_CALL( cuMemAllocSystem( (void**) &h_resBitVector, mem_Size_BitVector ) ); // PINNED

	

// initialize vector with 0

for(int i = 0; i < numOfBitVectorLines; ++i)

{

	h_resBitVector[i] = 0;

}

...

// allocate device memory for result

unsigned int* d_resBitVector = NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &d_resBitVector, mem_Size_BitVector));

CUDA_SAFE_CALL( cudaMemcpy( d_resBitVector, h_resBitVector, mem_Size_BitVector, cudaMemcpyHostToDevice) );

...

in the modes “emuDebug” and “emuRelease” everything is fine, but in the “Release” and “Debug” mode, only d_resBitVector[0] - d_resBitVector[63] are filled correct with "0"s. starting from index 64 the value ist “4026531840” (which means for me that the memory is not initialized).

mem_Size_BitVector = 32768

threads = 512

blocks = 128

if i set the number of threads to 256, only d_resBitVector[0] - d_resBitVector[31] are filled correct with "0"s. starting from index 32 the value ist “4026531840”.

any suggestions? :wacko:

thanks and best regards,

christoph

prkipfer · May 24, 2007, 11:37am

You didn’t tell us what numOfBitVectorLines is. If it is the correct size, or you replace the initialization with

memset(h_resBitVector,0,mem_Size_BitVector );

and still get non-null values, the only thing I can think of is that your addressing in the kernel is wrong.

Peter

christoph · May 24, 2007, 11:57am

sorry.

const unsigned int mem_Size_BitVector = sizeof(unsigned int) * numOfBitVectorLines;

with “numOfBitVectorLines = 8192” (8192 * 4 [what is sizeof(int)] = 32768)

I checked your code-snippet with “memset”. it shows the same behavior then before. the other point you mentioned concerning the addressing in the kernel makes no sense for me.

a.) in the emu-mode everything works fine.

b.) I got no errors when accessing “d_resBitVector” in the kernel. no kernel errors and no memory errors.

You didn’t tell us what numOfBitVectorLines is. If it is the correct size, or you replace the initialization with
memset(h_resBitVector,0,mem_Size_BitVector );
and still get non-null values, the only thing I can think of is that your addressing in the kernel is wrong.

Peter

[snapback]201012[/snapback]

Topic		Replies	Views
Question about CUDA_SAFE_CALL(cudaMemcpy(hostPx, CUDA_SAFE_CALL(cudaMemcpy(hostPx, device CUDA Programming and Performance	6	47480	January 23, 2009
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7967	March 13, 2009
strange behavior with device emulation CUDA Programming and Performance	5	2698	May 20, 2008
Trouble allocating device memory for a struct CUDA Programming and Performance cuda	8	595	March 8, 2022
HELP NEEDED! cudamemcpy CUDA Programming and Performance	3	2535	March 18, 2008
cudaMemset() problem CUDA Programming and Performance	8	9802	August 14, 2011
cudaMemcpy don't work CUDA Programming and Performance	4	1806	July 3, 2015
Problem CudaMallocHost CUDA Programming and Performance	4	2099	July 14, 2015
Writes to global memory are not visible CUDA Programming and Performance	5	6700	June 4, 2010
cudaMemcpy error "invalid argument" from in-kernel malloc'ed device mem buffer on cuda 4 CUDA Programming and Performance	8	13777	February 28, 2012

cudaMemcpy() works in emu-mode; release-mode don't

Related topics