Server

I would like to purchase a server in which I can have four V100 boards (based on the SXM2 with nVlink)
and two other PCIe boards (dual slot width, full length, Gen3x16).

I could not find sufficiently technical info about the DGX Station, but from what I heard it does not have additional PCIe slots.

I imagine that some of you here have a DGX station, so I would appreciate if you can send me the technical user manual of the station so that I could check this.

Is there another vendor of servers which I can replicate the DGX Station configuration of four V100 boards connected via nVlink
plus additional PCIe slots?

thanks

You are correct - DGX Station doesn’t have SXM2 GPUs (they are PCIe Tesla V100, with a 4x NVLink bridge) or additional slots. (The user guide can be found at https://docs.nvidia.com/dgx/dgx-station-user-guide/index.html )

A good starting point for finding servers like you’re asking for would be https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/ , although I don’t know offhand of a server that has what you’re asking for.

Thanks for your quick reply. But I am more confused now.

You write that the DGX Station is not based on the SXM2 GPUs but the PCIe V100?

I am confused because here https://www.nvidia.com/en-us/data-center/tesla-v100/ it says that NVLINK
comes with NVIDIA TESLA V100 that is based on SXM2 (at least that’s what the picture shows) whereas NVIDIA TESLA V100 FOR PCle does not mention nVlink.

I guess from your answer that there is another way to get nVlink, via a bridge and that’s different from the way it is done on SXM2?

If so, then if I buy a server with standard PCIe slots, will I be able to plug four V100 PCIe (not the SXM2 boards) into it? Can I purchase the nVlink bridge. Can you refer me to a technical document that shows what and how do I achieve this?

The GPUs used in the DGX-station are a special V100 GPU variant that is only available in a DGX-station.

Yes, with some GPUs it’s possible to get NVLink connectivity with a bridge, and yes, that is different than a SXM2 form factor GPU.

The only 2 V100 options you have outside of DGX systems are V100 PCIE (no NVLink connectivity) or SXM2 (NVLink connectivity).

Indeed, the DGX Station is unique in that it has 4x V100 GPUs in the PCIe formfactor, with NVLink via a bridge across the top of the cards (as you can see in https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/dgx-station/dgx-station-print-Infographic-738375-nvidia-web.pdf ), and with display output - that is not available in other V100 PCIe based systems. You had just brought up DGX Station earlier, which is the only way to get that functionality in the PCIe formfactor. :-)

If you need to have 4x V100 GPUs connected via NVLink, then in general a server with SXM2 form-factor GPUs is the right way to go. What about something like https://www.supermicro.com/en/products/system/1U/1029/SYS-1029GQ-TVRT.cfm ?

Thanks for clarifying this to me.

By the way the link to supermicro looks like what I need.
It can have four V100 based on SXM2, and it has four PCIe slots, as I need.

But do you think the cooling (based on fans) is sufficient? (The DGX Station has liquid cooling). Would this mean that I have to place the server in an Air-conditioned room, like in a data center? is there recommended operating temperature specification for the V100 SXM2 boards? can you refer me to link with technical specs?

Yeah…whereas DGX Station is a workstation product (meant to be used in an office environment), that Supermicro system, and almost every other “Server” system is really intended for operation within a datacenter, with managed inlet temperatures, airflow, etc. It’s very much driven by the server designer, so your best bet is to talk to (e.g.) Supermicro about specific specs.

It sounds like you’ve got some specific requirements though, so I’d recommend talking to any one of our NPN ELITE SOLUTION PROVIDERS partners - https://www.nvidia.com/en-us/data-center/where-to-buy-tesla/ . They can help guide you best on what systems might be the best fit.

I spoke to one partner. They don’t recommend having four V100 sxm2 and two Xilinx U250 all in one system.

So I am looking now at V100 PCIE instead.

Here
https://images.nvidia.com/content/tesla/pdf/Tesla-V100-PCIe-Product-Brief.pdf
it says the V100 PCIE requires passive cooling.

Is there a version with active cooling?

There is no version of Tesla V100-PCIE (or any Tesla V100 variant except the one in DGX-Station, which you cannot order separately) with active cooling.

The closest product to that description would be Quadro GV100

A server is a pc designed to system requests and deliver statistics to every other laptop over the internet or a local network. The word server is understood through most to suggest a net server the place internet pages can be accessed over the internet via a customer like a net browser.