My GTX 295 reports about 220 MB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, which is in line with the OpenCL spec’s minimum value of 1/4th CL_DEVICE_GLOBAL_MEM_SIZE. However, in practice I’ve been able to allocate and use buffers as large as 512MB, more than twice the stated maximum, with correct results.
What’s up with this?
Also, is there any way to determine how much memory can be allocated and be resident on the device all at once? Since my algorithm is a global scatter, the only way to break it into segments is to rerun the entire program for each segment, discarding/clipping all points that do not fall into the segment currently in memory. Thus, splitting the problem into the fewest segments such that a given segment fits into memory is critical to my program’s performance for large problem sizes.