I have one server w/ Two type GPU(RTX 3090, TITAN RTX)

jackto · April 8, 2021, 12:50am

Hello
I have a server with Two type GPU (RTX3090, TITAN RTX)
Nvidia driver: 460.32.03
CUDA :11.1
Pytorch: 1.8.0
I’m trying to use those together for Deeplearning training using Pytorch

when i use one type GPU(RTX3090 or TITAN RTX) it works
but when i use both GPU it does not work

i use nn.DataParallel for distribute weight optimize
is this CUDA’s problem or DataParallel module’s problem?
can i use together on the one server?

and when i saw the PCI bus number
TITAN =0
3090=1
but pytorch cuda defaultly 3090 assign device_id[0]
is this CUDA 11.1’s bug? or pytorch’s bug?

njuffa · April 8, 2021, 12:52am

What exactly does “it does not work” mean? What are the specific symptoms?

jackto · April 8, 2021, 12:56am

Thanks for reply.
it means “training does’t start”

njuffa · April 8, 2021, 1:04am

Are there any error messages, either to the screen or to a log file?

If you describe the specific software you are running and how you are running it (e.g. command line arguments), maybe someone will be able to help. At present there doesn’t seem to be enough information provided here that would allow a third party to analyze the issue or provide advice.

What are the hardware specifications of the system, besides the two GPUs: What CPU, how much system memory? What is the nominal wattage of the power supply unit (PSU)?

jackto · April 8, 2021, 1:35am

Thanks for helping me

yes i use specific software (pytorch)
it has a module for using multi gpu
if i need for training10GB this module distribute for multi GPU
so each GPU allocate 5GB

So i doubt 2 Side.

I can’t use 2 types of GPU on CUDA 11.1
pytorch distribute module does not work w/ 2types of GPU

I found a specifications
but i can’t find PSU info
CPU: 2 x Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Memory : 128GB

njuffa · April 8, 2021, 1:55am

The reason I asked for the hardware configuration is because people in the machine learning community sometimes just start cramming GPUs into a system without considering the resulting power requirements.

In your system, we have 2x100W for the two CPUs, 280W +350W for the GPUs, 50W for the system memory, 25W for the motherboard components, 5W per mass storage unit (SSD or HDD). So at minimum a total of 910W. That is for thermal design power (TDP), without considering power peaks due to load variance on CPUs and GPUs. If you want rock-solid operation for this system, you would want a power supply unit with 1500W or more in this system. So it would make sense to search for the PSU specifications.

I don’t use pyTorch, but I assume it can be run with numerous different configuration settings and numerous different input files. Maybe there wasn’t enough memory available. Maybe multi-GPU is only supported when both GPUs are of identical type. If there is a condition that prevents pyTorch from running, I would expect there to be an error message of some sort. Or, if your software calls an API provided by pyTorch, you may need to check the error status it returns. That is why I asked about checking for error message.

CUDA definitely supports running multiple GPUs of different type in the same system, provided they can all use the same driver. In fact, that is a very common scenario.The Titan RTX is a Turing-architecture GPU, while the RTX 3090 is an Ampere-architecture GPU. Both are supported by CUDA 11.1 and the driver that ships with it.

Topic		Replies	Views
Parallel training with 4 cards 4090 cannot be performed on AMD 5975WX， stuck at the beginning CUDA Programming and Performance	14	5928	February 20, 2023
Can I combine RTX8000 and RTXA6000 GPU - Hardware cuda , tensorflow , ubuntu , nvbugs	4	1050	July 18, 2022
Multi-GPU not work with Nvidia A100 PCI-E GPUs CUDA Programming and Performance	7	2672	November 23, 2021
Using RTX 3090 for Deep Learning Models Training GPU - Hardware	3	3720	February 21, 2021
Dual 3090 crashes when I use both the GPUs Linux	4	608	October 12, 2021
NVIDIA Quadro RTX 3000 Cuda Support CUDA Setup and Installation pytorch	3	7795	April 21, 2021
Help me,please. GTX-Titan doesn't work well. CUDA Programming and Performance	20	4318	July 3, 2015
Dual 3090 crashes when I use both the GPUs CUDA Setup and Installation cuda , ubuntu	10	3160	October 12, 2021
Performance Slowdown during Distributed Training with 4x RTX 4090 GPUs cuDNN cuda , pytorch , ai-training , gpu	6	4369	September 29, 2023
GPUs are stuck when using multiple GPUs to train CUDA Programming and Performance	4	1921	November 13, 2020

I have one server w/ Two type GPU(RTX 3090, TITAN RTX)

Related topics