When communicating over long distance RDMA, the Mellanox NIC will retransmit three times in one RTT since the RTT is about 30ms, is it possible to set this retransmission time? Currently NCCL_IB_TIMEOUT and MAX_PACKET_LIFETIME are set to 31 but have no effect
Hi Liuzhun,
Thank you for posting your query on NVIDIA Community. The addressed question requires internal escalation to Engineering Team, for which an active entitlement will be needed. In addition, details on the Part Number, PSID and Firmware/MLNX OFED version is needed.
If there an active entitlement/support contract in place, please do not hesitate to open a support ticket by emailing enterprisesupport@nvidia.com
For contracts, please reach out to Networking-Contracts@nvidia.com
Thanks,
Namrata.
ConnectX-6 Dx EN adapter card; 100GbE; Single-port QSFP56; PCIe 4.0 x16;
MLNX_OFED_LINUX-23.10-2.1.3.1
Image type: FS4
FW Version: 22.42.1000
FW Release Date: 8.8.2024
Product Version: 22.42.1000
Rom Info: type=UEFI version=14.35.15 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: b8cef603002652a6 2
Base MAC: b8cef62652a6 2
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000434
Security Attributes: N/A
[PN] Part number: MCX623105AN-CDAT