In CUDA programming, is there an upper limit on memory allocation for a pointer? I am using a Quadro GV100 GPU with 32GB of memory. When I allocate about 9GB for a single pointer, CUDA reports an “out of memory”; however, if I distribute that 9GB evenly between two pointers, no error occurs. Why this happened?
There is no limit like that. Probably more info would be needed, such as an exact complete code example. Sometimes people make mistakes with using incorrect integer variable types when trying to handle large numbers. I’m not saying its what is going on here; it’s just an example of coding error that can give rise to strange behavior on allocation attempts.
Another possibility is fragmentation. I would say its an unusual problem to have; to rule it out you would want to test your code allocating 9GB on an otherwise idle GPU, at the very start of your program. If that is successful, but later is not, then fragmentation could be an issue.
for (GPUN = 0; GPUN < device_count; ++GPUN)
{
cudaSetDevice(GPUN);
CHECK(cudaMalloc((void**)&d_wwt[GPUN], nxinner* njmax* nkmax * 15 * sizeof(float)));
CHECK(cudaMalloc((void**)&d_wwtfine[GPUN], nxinner* njmax* nkmax * 15 * sizeof(float)));
}
This code runs correctly.
for (GPUN = 0; GPUN < device_count; ++GPUN)
{
cudaSetDevice(GPUN);
CHECK(cudaMalloc((void**)&d_wwt[GPUN], nxinner* njmax* nkmax * 29 * sizeof(float)));
CHECK(cudaMalloc((void**)&d_wwtfine[GPUN], nxinner* njmax* nkmax * 1 * sizeof(float)));
}
However, this code does not run correctly because the first pointer’s memory allocation results in an out-of-memory error.
nxinner x njmax x nimax=800 x 400 x 300
assuming nimax is nkmax:
That calculation is greater than what can fit in a signed 32-bit integer quantity. This is the integer type issue I referred to. These expressions are evaluated left-to-right, and the type is promoted to the largest type encountered during that progression. Therefore, since sizeof()
returns a size_t
(a 64-bit integer) try this instead:
CHECK(cudaMalloc((void**)&d_wwt[GPUN], sizeof(float)*nxinner* njmax* nkmax * 29));