4RTX 3090 setup in single CPU


Can I ask regarding hardware setup in this forum?

In 2 RTX3090 graphics cards setup O cam use 3/4(mainly 4) slot NVLink hardware to transfer data in-between.

But suppose I want to setup 4 RTX for very high rendering computation. I know it is possible with a distribution system, with two CPUs (each with 2 RTX) and connect over ethernet cable.

But, if I want to fit all the 4 RTX in a single machine, is there any specific motherboard available for that? If so, then how can I connect all 4 RTX? NVLink will definitely work in this case.

Hello @_Bi2022 ,

I don’t think it is possible to give a generic recommendation on a perfect 4xRTX GPU setup, it completely depends on your use case and budget. While there are for example single CPU Enterprise Desktop mainboards with 4 PCIe x16 slots available, you might run into power and cooling issues if you try to build the system to normal Desktop standards and with four RTX3090. The NVLink is not a limiting factor, as long as all PCIe slots give full bandwidth. Of course the CPU itself needs to support the required number of lanes, which limits your choices to the very high end.

That is why most developers look at high-end Graphics workstations or server setups for multi-GPU solutions. And those are sold as Enterprise solutions with proper support by ISPs.

I am sorry if I cannot offer a specific recommendation, but maybe someone else here in the forums had to make that same decision before.


1 Like

As there are rumors everywhere that the RTX price is dropping, I was thinking something like a crypto mining rig but for real-time ray tracing. Not limited to 4 GPUs, maybe 6/8 GPUs. From your answer, now I realize it would not be as easy as I thought. Except for the power supply and cooling mechanism, I did not understand the data transmission between different GPUs clearly. Did you mean the generic PCIe data transmission would be enough among multi-GPU setup?

The NVLink is not a limiting factor, as long as all PCIe slots give full bandwidth. Of course, the CPU itself needs to support the required number of lanes, which limits your choices to the very high end.

I actually don’t know it clearly yet, I currently have 2RTX and a 4 slot nvlink bridge. I thought the nvlink is the only way to communicate in-between two rtx. If I do not use the NVlink, can I still transfer data (may be at a lower transmission rate)?

Hi again!

I am sorry if I was not too clear in my mention of NVLink. NVLink for consumer GPUs, that is RTX30xx generation cards, is limited to the RTX3090 and to only 2 GPUs connected at a time. That means if you use more GPUs you will run into Bandwidth bottlenecks depending on the underlying motherboard and the PCIe lanes, but not because of NVLink.

That said, if you plan to create a multi-GPU system with 4 or more GPUs, inter-GPU communication is still possible of course, but your bandwidth will be limited by PCIe speeds, since that bus is what will be used for data transfer.

For example on a normal desktop system nowadays with an end-user CPU, you would need to split your PCIe lanes between the GPUs, meaning for 2 GPUs you would get PCIe x8 each, for 4 it would be PCIe x4 each. Depending on your workloads, that can still be good enough. For real-time ray tracing it is the question of the software that actually supports workload balancing between multiple GPUs.

1 Like

It’s all clear now. Thanks a lot for all the explanation.

Just to know, suppose there is one machine with 4 GPUs with PCIe x4 data transmission. And on the other side, there are two machines with 2 GPUs each (connected with NVLink), as a distributed system connected over the high speed ethernet cables for data transmission. In your knowledge, which one should be faster for real-time ray tracing application? I think the second set-up will be faster than the first setup. What do you think?

If we look again at normal Desktop machines then the typical Network connection is Gigabit Ethernet, meaning 1GB/s. In comparison a PCIe 4.0 x16 achieves about 32 GB/s. So no, normal Ethernet connected distributed systems are rarely faster than PCIe bus connected systems. To achieve that you would need to go into server systems with different specialized NICs and Switches, but then you would also not look at GeForce GPUs.

1 Like

Thanks a lot, I got the clear idea now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.