cuMemAlloc and cuMemcpyDtoH limit?

I am trying to allocate some memory and while the allocation is passing I can’t copy from it if the allocation size is bigger than 1.5MB.

I also tried to use a huge memory allocation into my kernel and it is fine but since I can’t copy the result back to main memory it is kind of pointless.

Is there any reason for that?

Most of my tests were done using the MatrixMulDrv application by changing the mem_size_C variable under XP64.


What GPU are you running on? Specifically, how much memory does it have? There are some CUDA issues with cards that only have 128MiB of RAM.

In principle, there should be no problems with large arrays on the cards. I’ve allocated ~400MiB arrays on my card with only 512MiB of RAM and had no issues. Although some have reported potential fragmentation issues with large numbers of allocations/deallocations.

I am using the GF8800GTX with 768MB of memory so I had hope that this would not be an issue.
But it might well be.
I also have 2 monitors connected to it so they might be taking some memory in the same space but I would think that it would not be an issue with 768MB…

I have to say to make it clear that the allocations are not failing but in my case I have the test failing or in my real work case the cuMemcpyDtoH failing.

Complete machine spec’
Quad QX6950. 8GB of ram, XP64, PCIeGen2, GF8800GTX + GF8600. 3 Monitors (2 on the GF8800 and 1 on the GF8600)

mmm Interesting.

I have just added the line
mem_size_C = 32 * 1024 * 1024;
CU_SAFE_CALL(cuMemAlloc(&d_C, mem_size_C));
in matrixMulDrv.cpp.

Works on GF8600 (512MB), fails on 8800GTX (768MB)…

Now I am even more confused…

No Idea what is going on.
I deactivated the second head on the GF8800 and the test passed on it.
So I reactivated the head again, and once again it passed…

I am lost…