The reason I asked for the hardware configuration is because people in the machine learning community sometimes just start cramming GPUs into a system without considering the resulting power requirements.
In your system, we have 2x100W for the two CPUs, 280W +350W for the GPUs, 50W for the system memory, 25W for the motherboard components, 5W per mass storage unit (SSD or HDD). So at minimum a total of 910W. That is for thermal design power (TDP), without considering power peaks due to load variance on CPUs and GPUs. If you want rock-solid operation for this system, you would want a power supply unit with 1500W or more in this system. So it would make sense to search for the PSU specifications.
I don’t use pyTorch, but I assume it can be run with numerous different configuration settings and numerous different input files. Maybe there wasn’t enough memory available. Maybe multi-GPU is only supported when both GPUs are of identical type. If there is a condition that prevents pyTorch from running, I would expect there to be an error message of some sort. Or, if your software calls an API provided by pyTorch, you may need to check the error status it returns. That is why I asked about checking for error message.
CUDA definitely supports running multiple GPUs of different type in the same system, provided they can all use the same driver. In fact, that is a very common scenario.The Titan RTX is a Turing-architecture GPU, while the RTX 3090 is an Ampere-architecture GPU. Both are supported by CUDA 11.1 and the driver that ships with it.