GDS Hardware Configuration Specifications

I am planning to configure GDS this time.

I would like to know more about the GDS hardware specifications.

The configuration with 8 HGX A100 and 8 200G IB is for RDMA. Do you need a separate 100G IB port for storage for GDS?

If so, how many 100G IB ports do you need in total?

Hi @his-ckkim,

You posted in the Community Feedback section, unfortunately there are no support resources watching this category. I think this would be best served in the GPU Hardware category, I will move it over for you.

Best,
Tom K

Hello @his-ckkim !

In general your system provider or the OEM where you procure the systems from should be able to answer these questions. Next up you should also have Enterprise support who are happy to help.

Nevertheless I can try to answer some of it as well.

GDS as such is not directly bound to InfiniBand, which is usually dedicated to GPU interconnect and what is called “GPU Direct” (RDMA). GDS allows direct storage access from GPU to any PCIe storage devices. If and how this is possible very much depends on the underlying server architecture you are using, so I can’t give you advice on this. But you can find more details in our GDS Design Guide.

If you want to design a more complex setup of several HGX machines InfiniBand is used to interconnect each server with another to allow inter-GPU communication across servers.
Only if you also want to setup shared storage between servers, then it would make sense to increase networking capabilities, in case the existing IB ports are really all used for GPU Direct. Depending on your architecture you might then also need IB switches.

As initially stated I recommend that you discuss your planned server setup with your system provider to request guidelines, or directly engage with Enterprise support. They are much better suited to help you with your custom needs.

Thank you!