Can I combine RTX8000 and RTXA6000

Can I combine the RTX8000 and RTXA6000 for training?
If I have 2 of RTX A6000 and 1 of RTX 8000 GPU’s, for training my deep learning model, can I make model training to use all the 3 GPU’s?

If I combine both of them what would be the performance? will RTX A6000 drop performance to match RTX A8000?
Do they even work together?

I am using Ubuntu 18


Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:


This is a generic question. Its not about any code, its like can we combine these 2 gpu’s for training? if so can they share work and what would be the final performance be like? does rtxa6000 drops its performance to rtx8000? I have only 1 RTX8000, and want to buy RTXA6000, but not sure if they would work together in training a deeplearning model. so want to ask community here.


You may need to compare their compute capability. For better inputs we are moving this post to GPU - Hardware - NVIDIA Developer Forums forum.

Thank you.

Hello @narenjaz ,

the short answer is yes, the longer answer is “it depends”.

In general it is possible to set up heterogeneous multi-GPU systems. The workload distribution will happen through either a scheduling system (like for example SLURM) or natively through your chosen deep learning framework (like for example pytorch). CUDA itself does not automatically distribute workloads.

The limitation comes from the supported CUDA compute capability version. This should not be confused with the overall CUDA driver version.

RTX 8000 supports up to version 7.5 of compute capability.
RTX A6000 supports up to version 8.6 of compute capability, which is the latest one.

The most current CUDA driver in general is v11.7 right now and supports compute capability backwards up until version 3.5, which is Maxwell architecture and newer.

That means as long as your Deep Learning code uses only compute features from CUDA that are supported on the older Hardware, you can safely use both GPUs together.

The performance of the newer GPU will not be throttled by this. The important side effect is that the newer GPU will be done with workloads faster, so your scheduler needs to be aware of this and be able to distribute tasks accordingly!

As far as I know pytorch for example can only evenly distribute workloads but is not able to take GPU capabilities into account, which means the older GPU would throttle the newer one unless you manually adjust things like batch size per GPU or similar.

I hope this helps!