Hi!
I am trying to build an application using NVSHMEM on a cluster that has Slingshot 11 and libfabric as the transport backend. The same application on an IB cluster scales really well (without IBGDA) but poorly on this libfabric machine
I wanted to understand if there are specific flags that need to be set to optimise the performance of NVSHMEM over libfabric.