Can I combine the RTX8000 and RTXA6000 for training?
If I have 2 of RTX A6000 and 1 of RTX 8000 GPU’s, for training my deep learning model, can I make model training to use all the 3 GPU’s?
If I combine both of them what would be the performance? will RTX A6000 drop performance to match RTX A8000?
Do they even work together?
This is a generic question. Its not about any code, its like can we combine these 2 gpu’s for training? if so can they share work and what would be the final performance be like? does rtxa6000 drops its performance to rtx8000? I have only 1 RTX8000, and want to buy RTXA6000, but not sure if they would work together in training a deeplearning model. so want to ask community here.
the short answer is yes, the longer answer is “it depends”.
In general it is possible to set up heterogeneous multi-GPU systems. The workload distribution will happen through either a scheduling system (like for example SLURM) or natively through your chosen deep learning framework (like for example pytorch). CUDA itself does not automatically distribute workloads.
The limitation comes from the supported CUDA compute capability version. This should not be confused with the overall CUDA driver version.
RTX 8000 supports up to version 7.5 of compute capability.
RTX A6000 supports up to version 8.6 of compute capability, which is the latest one.
The most current CUDA driver in general is v11.7 right now and supports compute capability backwards up until version 3.5, which is Maxwell architecture and newer.
That means as long as your Deep Learning code uses only compute features from CUDA that are supported on the older Hardware, you can safely use both GPUs together.
The performance of the newer GPU will not be throttled by this. The important side effect is that the newer GPU will be done with workloads faster, so your scheduler needs to be aware of this and be able to distribute tasks accordingly!
As far as I know pytorch for example can only evenly distribute workloads but is not able to take GPU capabilities into account, which means the older GPU would throttle the newer one unless you manually adjust things like batch size per GPU or similar.