Hi, nvshmem developers
I have noticed that in IBGDA, nvshmem use ah_attr.dlid, which is set 49152 in RoCE v2, to set udp_sport. However, if all QPs use the same sport, it will cause traffic to be poorly load balanced across the network. May I ask what the reasoning is behind fixing udp_sport to 49152? Is this necessary?
static int ibgda_rc_init2rtr(nvshmemt_ibgda_state_t *ibgda_state, struct ibgda_ep *ep,
const struct ibgda_device *device, int portid,
struct ibgda_rc_handle *peer_ep_handle) {
...
assert(roce_version == 1 || roce_version == 2);
ah_attr.dlid = port_attr->lid | (roce_version == 1 ? IBGDA_ROCE_V1_UDP_SPORT_BASE
: IBGDA_ROCE_V2_UDP_SPORT_BASE);
ah = ftable.create_ah(device->pd, &ah_attr);
NVSHMEMI_NULL_ERROR_JMP(ah, status, NVSHMEMX_ERROR_INTERNAL, out, "Unable to create ah.\n");
dv.ah.in = ah;
dv.ah.out = &dah;
mlx5dv_init_obj(&dv, MLX5DV_OBJ_AH);
memcpy(DEVX_ADDR_OF(qpc, qpc, primary_address_path.rmac_47_32), &dah.av->rmac,
sizeof(dah.av->rmac));
DEVX_SET(qpc, qpc, primary_address_path.hop_limit, IBGDA_GRH_HOP_LIMIT);
DEVX_SET(qpc, qpc, primary_address_path.src_addr_index,
device->gid_info[portid - 1].local_gid_index);
DEVX_SET(qpc, qpc, primary_address_path.eth_prio, ibgda_state->options->IB_SL);
DEVX_SET(qpc, qpc, primary_address_path.udp_sport, ah_attr.dlid);
DEVX_SET(qpc, qpc, primary_address_path.dscp, ibgda_state->options->IB_TRAFFIC_CLASS >> 2);
...
}