cudaMemcpy() works in emu-mode; release-mode don't

hi all,

i have a problem with cudaMemcpy(). i allcote memory on the host for my result with malloc() [i also tried cuMemAllocSystem()]. i initialize the memory with "0"s and copy it to the device.

// allocate host memory for result

unsigned int* h_resBitVector = NULL;

h_resBitVector = (unsigned int *)malloc( mem_Size_BitVector ); // PAGED

//CU_SAFE_CALL( cuMemAllocSystem( (void**) &h_resBitVector, mem_Size_BitVector ) ); // PINNED


// initialize vector with 0

for(int i = 0; i < numOfBitVectorLines; ++i)


	h_resBitVector[i] = 0;



// allocate device memory for result

unsigned int* d_resBitVector = NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &d_resBitVector, mem_Size_BitVector));

CUDA_SAFE_CALL( cudaMemcpy( d_resBitVector, h_resBitVector, mem_Size_BitVector, cudaMemcpyHostToDevice) );


in the modes “emuDebug” and “emuRelease” everything is fine, but in the “Release” and “Debug” mode, only d_resBitVector[0] - d_resBitVector[63] are filled correct with "0"s. starting from index 64 the value ist “4026531840” (which means for me that the memory is not initialized).

mem_Size_BitVector = 32768

threads = 512

blocks = 128

if i set the number of threads to 256, only d_resBitVector[0] - d_resBitVector[31] are filled correct with "0"s. starting from index 32 the value ist “4026531840”.

any suggestions? :wacko:

thanks and best regards,


You didn’t tell us what numOfBitVectorLines is. If it is the correct size, or you replace the initialization with

memset(h_resBitVector,0,mem_Size_BitVector );

and still get non-null values, the only thing I can think of is that your addressing in the kernel is wrong.



const unsigned int mem_Size_BitVector = sizeof(unsigned int) * numOfBitVectorLines;

with “numOfBitVectorLines = 8192” (8192 * 4 [what is sizeof(int)] = 32768)

I checked your code-snippet with “memset”. it shows the same behavior then before. the other point you mentioned concerning the addressing in the kernel makes no sense for me.

a.) in the emu-mode everything works fine.

b.) I got no errors when accessing “d_resBitVector” in the kernel. no kernel errors and no memory errors.