Solved: Memory Allocation Problems

Hi I am pretty new to CUDA programming, but I have run into an issue that somewhat puzzles me.

If I use normal malloc to allocate host memory and cudaMalloc to allocate device memory, everything works fine until I try to retrieve my results.

cudaMemcpy(hostPtr, devicePtr, size, cudaMemcpyDeviceToHost) fails with "cudaErrorIllegalAddress"

However, it only fails if “size” is larger than 1048576 bytes. With sizes below everything works fine.

If I use cudaMallocHost to allocate host memory, I directly get a segmentation fault (SIGSEGV) in libc.so. Actually I already get this segmentation fault when I try to allocate more than 262144 bytes (2^18) of memory using that function.

Is there a limit or am I doing something wrong?

P.S.: The system I am working on runs Linux 3.19 Kernel has a Quadro K2000 installed. I use TK Version 7.0.

This line of code is probably occurring after one or more kernel calls.

The cudaErrorIllegalAddress is an error returned from a kernel operation. It has nothing to do with this specific cudaMemcpy operation. CUDA runtime API calls can return an error associate with the call or any asynchronous error that has occurred previously.

When you increase the “size” it is also probably having some sort of affect on the kernel operation, which is impossible to diagnose by looking at the one line of code you have shown.

You might use the method described in the answer here:

[url]cuda - Unspecified launch failure on Memcpy - Stack Overflow

to identify a specific line of your kernel code that is causing the illegal address error.

Thanks for that hint. I finally figured out what went wrong. Apparently, I sometimes wrote beyond the boundaries of the allocated memory (multiplied number of elements with the element size twice). So I messed up some memory area that I shouldn’t and CUDA did not like it.