I have one server w/ Two type GPU(RTX 3090, TITAN RTX)

Hello
I have a server with Two type GPU (RTX3090, TITAN RTX)
Nvidia driver: 460.32.03
CUDA :11.1
Pytorch: 1.8.0
I’m trying to use those together for Deeplearning training using Pytorch

when i use one type GPU(RTX3090 or TITAN RTX) it works
but when i use both GPU it does not work

i use nn.DataParallel for distribute weight optimize
is this CUDA’s problem or DataParallel module’s problem?
can i use together on the one server?

and when i saw the PCI bus number
TITAN =0
3090=1
but pytorch cuda defaultly 3090 assign device_id[0]
is this CUDA 11.1’s bug? or pytorch’s bug?

What exactly does “it does not work” mean? What are the specific symptoms?

Thanks for reply.
it means “training does’t start”

Are there any error messages, either to the screen or to a log file?

If you describe the specific software you are running and how you are running it (e.g. command line arguments), maybe someone will be able to help. At present there doesn’t seem to be enough information provided here that would allow a third party to analyze the issue or provide advice.

What are the hardware specifications of the system, besides the two GPUs: What CPU, how much system memory? What is the nominal wattage of the power supply unit (PSU)?

Thanks for helping me

yes i use specific software (pytorch)
it has a module for using multi gpu
if i need for training10GB this module distribute for multi GPU
so each GPU allocate 5GB

So i doubt 2 Side.

  1. I can’t use 2 types of GPU on CUDA 11.1
  2. pytorch distribute module does not work w/ 2types of GPU

I found a specifications
but i can’t find PSU info
CPU: 2 x Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Memory : 128GB

The reason I asked for the hardware configuration is because people in the machine learning community sometimes just start cramming GPUs into a system without considering the resulting power requirements.

In your system, we have 2x100W for the two CPUs, 280W +350W for the GPUs, 50W for the system memory, 25W for the motherboard components, 5W per mass storage unit (SSD or HDD). So at minimum a total of 910W. That is for thermal design power (TDP), without considering power peaks due to load variance on CPUs and GPUs. If you want rock-solid operation for this system, you would want a power supply unit with 1500W or more in this system. So it would make sense to search for the PSU specifications.

I don’t use pyTorch, but I assume it can be run with numerous different configuration settings and numerous different input files. Maybe there wasn’t enough memory available. Maybe multi-GPU is only supported when both GPUs are of identical type. If there is a condition that prevents pyTorch from running, I would expect there to be an error message of some sort. Or, if your software calls an API provided by pyTorch, you may need to check the error status it returns. That is why I asked about checking for error message.

CUDA definitely supports running multiple GPUs of different type in the same system, provided they can all use the same driver. In fact, that is a very common scenario.The Titan RTX is a Turing-architecture GPU, while the RTX 3090 is an Ampere-architecture GPU. Both are supported by CUDA 11.1 and the driver that ships with it.