we’re currently researching NVIDIA GPUs with large memory capacities for real-time computer vision applications using CUDA. Does NVIDIA A16 behave like 4 individual GPUs? Or can it also be used as unified GPU? Can a single CUDA program access all 4x1280 (5120) CUDA cores and access 4x16 GB (64GB)?
According to the datasheet I understand that it is designed for virtualization tasks. Does this mean that it presents itself to the operating system as 4 separate GPUs?
For anyone else with this question, it was kindly answered by NVIDIA at the NPN Partner Day:
A16 is 4 individual A2 GPUs (GA107) packed on a single board for higher density. Hence, it will also show up as 4 separate GPUs to the OS/applications. It cannot be combined to act as a unified GPU.
Thanks for helping others by posting the answer. I shall mark it answered as well,
Just as if you had 4 GPUs its possible to write your CUDA application to detect and launch kernels on multiple GPUs, but each would have its only memory space. So if your application can scale in such a manner you may be able to leverage all the GPUs concurrently,