hi all,
i have a problem with cudaMemcpy(). i allcote memory on the host for my result with malloc() [i also tried cuMemAllocSystem()]. i initialize the memory with "0"s and copy it to the device.
// allocate host memory for result
unsigned int* h_resBitVector = NULL;
h_resBitVector = (unsigned int *)malloc( mem_Size_BitVector ); // PAGED
//CU_SAFE_CALL( cuMemAllocSystem( (void**) &h_resBitVector, mem_Size_BitVector ) ); // PINNED
// initialize vector with 0
for(int i = 0; i < numOfBitVectorLines; ++i)
{
h_resBitVector[i] = 0;
}
...
// allocate device memory for result
unsigned int* d_resBitVector = NULL;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_resBitVector, mem_Size_BitVector));
CUDA_SAFE_CALL( cudaMemcpy( d_resBitVector, h_resBitVector, mem_Size_BitVector, cudaMemcpyHostToDevice) );
...
in the modes “emuDebug” and “emuRelease” everything is fine, but in the “Release” and “Debug” mode, only d_resBitVector[0] - d_resBitVector[63] are filled correct with "0"s. starting from index 64 the value ist “4026531840” (which means for me that the memory is not initialized).
mem_Size_BitVector = 32768
threads = 512
blocks = 128
if i set the number of threads to 256, only d_resBitVector[0] - d_resBitVector[31] are filled correct with "0"s. starting from index 32 the value ist “4026531840”.
any suggestions? :wacko:
thanks and best regards,
christoph