Dear Mellanox Gurus,
Would you please advice the pros/cons of Omni-Path vs. Mellanox + Ethernet vs. InfiniBand?
Thanks in advance + Happy Monday
>> Learn about InfiniBand benefits for free on the Mellanox Academy https://academy.mellanox.com/en/course/infiniband-essentials/?cm=244
The major and most important element in the interconnect world is the interconnect capabilities to increase applications performance. The ability to offload the CPU, to execute MPI operations in the interconnect hardware, to provide hardware based RDMA and more, basically an offloading architecture, is the critical element in building the most efficient high-performance systems. The reason that Pathscale did not make it with InfiniPath network products, or QLogic with thir TrueScale products is from lack of any offloading capabilities, and their need to use the CPU cycles for anything related to the network. OmniPath is completely the same as TrueScale.
Intel try to move the discussion to number of ports etc. as they don’t want users to focus on the most important item of offloading versus non-offloading. Nevertheless, the number of ports you want to have on the switch ASIC is a number that will fit the sweet spot of the technology. The fact that Intel had to find something to “show benefit” and designed a 48 ports ASIC does not mean it is a better device. Intel switch is a higher latency switch versus Mellanox (110ns vs 90ns), it requires you to use special data protection mechanism (LLR) on any port, and any cable distance. Mellanox does not need to use those mechanisms for 2m copper cables and 30m fiber cables (enough for a datacenter design). Moreover, if you saw the Intel switch design (picture on their web site) you can see a very weird front panel - due to signal integrity issues.
The switch ports, for both Mellanox and Intel are consist of 4 lanes - so there is no advantage here for Intel.
As for your second question the answer is no. Intel does not lock their CPUs and I don’t think their will ever be. Intel wants to sell CPUs, as this is their primer business. The reason is that they created Omni-Path is to try and have a differentiation versus the other CPU vendors. Omni-Path does not deliver the performance, efficiency as Mellanox InfiniBand, and it is clearly not proven and actually does not yet exist out there. The best option is Intel CPUs and Mellanox interconnect. And Intel knows that too.
To overcome the performance limitations of today’s HPC systems we need an intelligent interconnect. The interconnect becomes a co-processor, offloading the CPU, increasing data center efficiency.
Intel Omni-Path is a no-offload and proprietary network product. The same old Pathscale “Infini-Path” (and QLogic “True-Scale”) product, running at higher network speed. Does not support RDMA, HPC offloads, cloud offloads or any other network offloads. It requires the CPU to handle all network operations, results in lower CPU efficiency (high overhead) …
So who is Omni-Path good for? Intel – it will require users to buy more CPUs to try and overcome lower data center efficiency.
And why does Intel push inferior network technology? Intel tries to show value versus their CPU competitors (ARM, Power, etc.)
Mellanox InfiniBand delivers leading performance over Omni-Path promises: higher message rate, lower latency, lower power consumption, and estimated 2X higher system performance and efficiency.
Mellanox EDR solution is robust, working, and delivering scalable performance. Omni-Path is not.
Explain / comment on several answer:
- why do not mellanox produces crystals with a large number of ports than 36? I mean switches with full-matrix link between ports. I think its good look than you want do other than fat tree topology.
Why number of ports is limited?
In Omnipath (as understood from the documentation), increased number of ports by creating a super ports, which are divided into 4 ports with a smaller bandwidth. Why Mellanox not go on this path?
- Does Intel make a full vendor lock to omnipash + intel processors (include all controllers to CPU)?
If so, that will make mellanox?
Very thanks for the detailed answer. I have learned a lot of new things
Can I clarify a few points ?
Mellanox does not need to use those mechanisms for 2m copper cables and 30m fiber cables (enough for a datacenter design)
This is also true for all other cable lengths or only to 2m copper and 30m fiber ?
- And yet there mellanox plans to increase the number of ports on a ASIC switch ?