kng
April 8, 2010, 7:37am
1
so guys i am using cudaMemcpy in a loop just to test what is going on inside the kernel and i am getting very strange results
the first time i call cudaMemcpy the result is correct the second and all iteration afterwards seem to give corrupted values
gpuAssert( cudaMemcpy(temp, device_largestElements, sizeof(float), cudaMemcpyDeviceToHost) );
where temp is an array of float containing just one element
and device_largestElements is an array which contains 22332 floats
and i only wish to get the first element of the array is it possible the cudaMemcpy is bugging due to the array size differences?
(btw if i remove the cudaMemcpy from outside the loop and just call it once after the loop then the result is correct)
Almost certainly not. In most of my linear algebra codes, I run my own GPU memory manager which looks after a gigantic chunk of pre-allocated device memory for the life of the application. My codes wind up doing the equivalent of
cudaMemcpy(host_chunk, device_chunk+random_offset, (size_t)random_size * sizeof(random_type), cudaMemcpyDeviceToHost)
all day long (literally thousands of times over hours/days) and never miss a beat, so anecdotally I would suggest you look somewhere else for your problem.
kng
April 8, 2010, 8:49am
3
Almost certainly not. In most of my linear algebra codes, I run my own GPU memory manager which looks after a gigantic chunk of pre-allocated device memory for the life of the application. My codes wind up doing the equivalent of
cudaMemcpy(host_chunk, device_chunk+random_offset, (size_t)random_size * sizeof(random_type), cudaMemcpyDeviceToHost)
all day long (literally thousands of times over hours/days) and never miss a beat, so anecdotally I would suggest you look somewhere else for your problem.
you made me laugh with the emphasis and effort you put in to prove a point :D needed that thanks