Hi I am pretty new to CUDA programming, but I have run into an issue that somewhat puzzles me.
If I use normal malloc to allocate host memory and cudaMalloc to allocate device memory, everything works fine until I try to retrieve my results.
cudaMemcpy(hostPtr, devicePtr, size, cudaMemcpyDeviceToHost) fails with "cudaErrorIllegalAddress"
However, it only fails if “size” is larger than 1048576 bytes. With sizes below everything works fine.
If I use cudaMallocHost to allocate host memory, I directly get a segmentation fault (SIGSEGV) in libc.so. Actually I already get this segmentation fault when I try to allocate more than 262144 bytes (2^18) of memory using that function.
Is there a limit or am I doing something wrong?
P.S.: The system I am working on runs Linux 3.19 Kernel has a Quadro K2000 installed. I use TK Version 7.0.