How much GPUs can CUDA support?

I mean, whether CUDA can support more than one Graphic Card?(eg:plugin two Graphic Cards in one mainboard, but WITHOUT SLI). And if it can, how does it works? How does it distribute the computing tasks?

CUDA can support at least up to 16 GPUs in a single system, although there may be various system and/or OS limitations that limit it to less than 16. The DGX-2 product from NVIDIA for example has 16 V100 GPUs in a single server running linux.

The CUDA runtime API provides functions to switch from one GPU to another, within a single application, to enable copy of data to or from a specific GPU, and/or launching work on a specific GPU.

The basic CUDA language does not automatically distribute work across GPUs; that is the programmer’s responsibility, just as a CPU code does not automatically distribute itself across multiple CPU cores, unless the programmer takes steps to make that happen.

However, there are certain CUDA libraries such as CUBLAS and CUFFT which have library functions that can distribute work (e.g. a matrix multiply, or a FFT) across multiple GPUs.

thanks, I get it.