Why is NVIDIA OpenCL CL_DEVICE_MAX_MEM_ALLOC_SIZE (allocatable memory) never more than 25% of CL_DEVICE_GLOBAL_MEM_SIZE, when other platforms are sometimes 50%, 70% or even 100%? If it is because of a mistaken interpretation of the OpenCL 1.2 standard, there may be an opportunity for NVIDIA to increase the memory available to applications. If this is an NVIDIA-specific constraint, if it is not arbitrary, what is the root cause?
In the OpenCL specification, CL_DEVICE_MAX_MEM_ALLOC_SIZE controls how much GPU memory is available for allocation. Quoting the spec:
“CL_DEVICE_MAX_MEM_ALLOC_SIZE - Max size of memory object allocation in bytes. The minimum value is max (1/4th of CL_DEVICE_GLOBAL_MEM_SIZE, 12810241024) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.”
This phrasing is potentially confusing. Paraphrased, I interpret it as follows:
“When implementing OpenCL, CL_DEVICE_MAX_MEM_ALLOC_SIZE must be set in order to inform applications about maximum allocatable memory. To be compliant with this specification, this maximum must be either either one fourth of the device’s physical memory (CL_DEVICE_GLOBAL_MEM_SIZE) or 128 binary megabytes, whichever is greater.”
This defines the minimum for CL_DEVICE_MAX_MEM_ALLOC_SIZE. Notably, the specification does not define how to calculate a maximum for CL_DEVICE_MAX_MEM_ALLOC_SIZE.
Unless there is an undocumented additional NVIDIA-only constraint, for all known NVIDIA OpenCL-compatible cards that I surveyed , allocatable memory appears to be artificially constrained to only 25% of physical memory. It is sometimes less, but it appears to never be more.
NVIDIA appears to be alone in this. Intel, AMD, and pocl implementation sometimes exceed the 25% mark. Some AMD implementations appear to work from a higher limit, sometimes 50% or 70% of memory. Some pocl implementations appear to make the maximum physical amount fully available for OpenCL.
Other messages in this forum have asked this question, and people point to the spec as the reason, but this is based on a flawed interpretation of the spec.
- See https://gist.github.com/roycewilliams/5ac28350023613c614034c7fb6ba715d for a survey of discovered values for many hardware platforms.