My system: GTX 980, Ubuntu 14.04, Eclipse IDE
I’ve been attempting to allocate a chunk of pitched memory on the device with dimensions of 65536 bytes x 5000 rows, so roughly 300 MB. I have an error trap on the call to cudaMallocPitch and it does return success. However, when I try to access the memory in device code I get errors which cuda-memcheck has thusfar been useless to diagnose. I’m assuming that large of an allocation is going to give unstable behavior.
My questions are:
- Are there size limits on cudaMallocPitch that I don’t see in the documentation? I can sacrifice some width in the rows but I cannot drop any rows; the only alternative would be to cut it into multiple allocations with fewer rows. That’s doable, but eventually the code will deployed to a multi-GPU environment, I need to be certain that whatever size I end up with will always work.
- Could the fact that I am using the GTX 980 as the display card in my system (no on-board VGA) be causing the issue? i.e. if the card were solely for CUDA, could I assume the memory is basically “empty” until I call cudaMalloc?
- Do you have any suggestions for best practices on addressing this chunk of memory? i.e., the way I’m doing it now is to allocate a device memory pointer on the host, and copy the pointer address to a symbol in constant memory, which is then read by the kernel. Yes, I know I could just pass the address as a parameter, but the kernel is fairly register intensive and I figured saving a local variable/parameter buffer was worth the extra read time (I’m not utilizing much more than 4K of constant memory so the cache should basically have all of it). Also, if we were to break up the large chunk into smaller pieces, I would need to pass multiple parameters and that could get cumbersome quickly.