I have a simple application allocating memory blocks of the size 4096 x 1024 x N (I tried both cudaMalloc3D). Around N = 425 I get “out of memory” errors (running on a 4 GB Tesla S1070). Using cudaMallocPitch I encountered a similar limit around 1.7 GB with 2D arrays. Is this a known limitation? Anyone managed to alloc larger blocks? Is there any information about that limitation?
Thanks && kind regards
Edit: I forgot to mention, I’m running on Windows Server 2008 with CUDA 2.2