To me, it is no surprise that Mellanox, probably the manufacturer of the longer tenure in InfiniBand, trounced Intel’s OmniPath implementation in scalability. I suspect this boils down to this difference:
Mellanox has an offload model, which tries to offload as much of the network processing from the CPUs in the cluster to the host adapters and the switch as is possible. …, about 70 percent of the load of the MPI protocol and other related protocols for HPC is shifted to the network, freeing up CPU cycles to do HPC work. [On the other hand, Intel’s]…Omni-Path 100 interconnects employ an onload model, where a lot of this MPI and related processing is done across the distributed compute in the cluster.
Intel chooses onload implementation probably because it wants its customers to continue to buy high-end Xeon processors. Architecturally I think that is a mistake, and the benchmark numbers confirms it.
I worked at U.S. national exchanges and high-frequency trading firms, where we relied heavily on the use of offload model to not only lessen workload on CPUs, but also to ensure that the InfiniBand adapters can read and write the host’s memory directly, without involvement of the host’s device drivers or operating system. It is an incredibly efficient technique that is also being researched by certain commercial database vendors who shall remain unnamed.