need ABC of CUDA Clusters

Hello,
I need to understand if CUDA clusters are easy to deploy using Redhat HPC or other HPC solutions. How does it work?
Does it virtualize all the CUDA cards in the cluster and show you as one system with all those cards?
Applications that run on CUDA enabled desktop be able to run on CUDA cluster. Is it transparent or the applications will have to be recompiled.
Will all Nvidia CUDA toolset work flawlessly on the cluster?
From the users perspective is all of this complexity hidden so he/she never has to worry about whats going on understand. just write code, compile and deploy.

Redhat is a good choice of OS and all the common versions support CUDA.

Common practice is to use standard MPI to fan computations out to each node, and then use precompiled CUDA binaries or libraries that offload from the CPU in each node to one or more GPUs in each node.

So the developers need to write their own MPI + CUDA or CUDA library code, wherein MPI is responsible for all the internode communication (CUDA doesn’t do that) and CUDA is responsible for leveraging the GPUs in each node (MPI doesn’t do that).

Our ArrayFire Pro cluster users do the same thing: they plug ArrayFire CUDA library (+ any other custom CUDA code they may have) into their existing MPI code and it works pretty seamlessly. Just need to make sure the CUDA driver is installed properly in each node.

Good luck!