mlnx_qos cannot assign priority values to TCs after 8 SR-IOV devices

Hi,

I have a problem with mlnx_qos with ConnectX-4 NICs. It works perfectly if 8 SR-IOV devices are enabled. If changing the value of NUM_OF_VFS from 8 to 32 with mlxconfig tool after the reboot mlnx_qos cannot assign priority values to TCs, and gives this error message “Netlink error: Bad value. see dmesg”, but nothing is displayed in dmesg. Also the default output has changed from this:

tc: 0 ratelimit: unlimited, tsa: vendor

priority: 1

tc: 1 ratelimit: unlimited, tsa: vendor

priority: 0

tc: 2 ratelimit: unlimited, tsa: vendor

priority: 2

tc: 3 ratelimit: unlimited, tsa: vendor

priority: 3

tc: 4 ratelimit: unlimited, tsa: vendor

priority: 4

tc: 5 ratelimit: unlimited, tsa: vendor

priority: 5

tc: 6 ratelimit: unlimited, tsa: vendor

priority: 6

tc: 7 ratelimit: unlimited, tsa: vendor

priority: 7

to this:

tc: 0 ratelimit: unlimited, tsa: vendor

priority: 0

priority: 1

priority: 2

priority: 3

priority: 4

priority: 5

priority: 6

priority: 7

I could not figure out what can be the problem. For me it seems to be a bug in the FW or in the driver. But if not, what am I missing?

Thanks in advance.

David

Hi Steve,

I’m using the latest FW and MOFED as well.

I’ve issued the command you advised, it returned the correct output, but priority levels are all stacked under TC0, which by default are spreaded over all the TCs.

The output:

mlnx_qos -i enp6s0f1 --pfc 0,0,0,1,0,0,0,0

DCBX mode: OS controlled

Priority trust state: pcp

Cable len: 7

PFC configuration:

priority 0 1 2 3 4 5 6 7

enabled 0 0 0 1 0 0 0 0

tc: 0 ratelimit: unlimited, tsa: vendor

priority: 0

priority: 1

priority: 2

priority: 3

priority: 4

priority: 5

priority: 6

priority: 7

Thanks,

David

Hi Steve,

the default output is shown above, except the PFC setting from your previous post.

DCBX mode: OS controlled

Priority trust state: pcp

Cable len: 7

PFC configuration:

priority 0 1 2 3 4 5 6 7

enabled 0 0 0 1 0 0 0 0

tc: 0 ratelimit: unlimited, tsa: vendor

priority: 0

priority: 1

priority: 2

priority: 3

priority: 4

priority: 5

priority: 6

priority: 7

And yes, when I try to set prio values to ETCs I get the errors.

The output with -p or with --prio_tc is listed above

sudo mlnx_qos -i enp6s0f1 -p 0,1,2,3,4,5,6,7

Netlink error: Bad value. see dmesg.

Thank you, David

Hello David -

Could you paste you dmesg file?

thanks - steve

Hello David -

I hope all is well…

Could you open a case with Mellanox Support so we can take a deeper look at this issue?

Thank you -

Steve

Hello David -

Unfortunately, we can not make apples to oranges comparisons to help us.

I really need the entire dmesg log…

thanks - steve

Hi Steve,

another update, I have tried my settings on an other server, with different NICs by HP (HP_2690110034), and different fw.

The card type is

Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Using these NICs everything is fine without any problem.

Maybe the problem is related to MT27700.

Thanks,

David

What is the exact command you are using?

thanks - steve

Hi Steve,

sorry for not answering, but I was out of the office.

The NIC type according to lspci is MT27700

The command I used:

sudo mlnx_qos -i enp6s0f1 -p 0,1,2,3,4,5,6,7

The output: Netlink error: Bad value. see dmesg.

And the output of dmesg related to mlx

11.872656] mlx5_core 0000:06:00.0: firmware version: 12.21.2010

[ 13.604119] mlx5_core 0000:06:00.0: Port module event: module 0, Cable plugged

[ 13.614789] mlx5_core 0000:06:00.1: firmware version: 12.21.2010

[ 15.059230] mlx5_core 0000:06:00.1: Port module event: module 1, Cable plugged

[ 15.148425] mlx5_core 0000:06:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)

[ 15.442744] mlx5_core 0000:06:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)

[ 15.762845] mlx5_core 0000:06:00.0 enp6s0f0: renamed from eth0

[ 15.820204] mlx5_core 0000:06:00.1 enp6s0f1: renamed from eth1

[ 28.629317] mlx5_ib: Mellanox Connect-IB Infiniband driver v4.2-1.2.0

[ 215.304044] mlx5_core 0000:06:00.1 enp6s0f1: mlx5e_update_carrier:153: Link up

And I also attach the output of mlxconfig.

Device #1:


Device type: ConnectX4

Name: N/A

Description: N/A

Device: /dev/mst/mt4115_pciconf0

Configurations: Next Boot

MEMIC_BAR_SIZE 0

MEMIC_SIZE_LIMIT _256KB(1)

ROCE_NEXT_PROTOCOL 254

NON_PREFETCHABLE_PF_BAR False(0)

NUM_OF_VFS 32

FPP_EN True(1)

SRIOV_EN True(1)

PF_LOG_BAR_SIZE 5

VF_LOG_BAR_SIZE 0

NUM_PF_MSIX 63

NUM_VF_MSIX 11

INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)

CQE_COMPRESSION BALANCED(0)

IP_OVER_VXLAN_EN False(0)

LRO_LOG_TIMEOUT0 6

LRO_LOG_TIMEOUT1 7

LRO_LOG_TIMEOUT2 8

LRO_LOG_TIMEOUT3 12

LOG_DCR_HASH_TABLE_SIZE 14

DCR_LIFO_SIZE 16384

ROCE_CC_PRIO_MASK_P1 255

ROCE_CC_ALGORITHM_P1 ECN(0)

ROCE_CC_PRIO_MASK_P2 255

ROCE_CC_ALGORITHM_P2 ECN(0)

CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)

CLAMP_TGT_RATE_P1 False(0)

RPG_TIME_RESET_P1 300

RPG_BYTE_RESET_P1 32767

RPG_THRESHOLD_P1 1

RPG_MAX_RATE_P1 0

RPG_AI_RATE_P1 5

RPG_HAI_RATE_P1 50

RPG_GD_P1 11

RPG_MIN_DEC_FAC_P1 50

RPG_MIN_RATE_P1 1

RATE_TO_SET_ON_FIRST_CNP_P1 0

DCE_TCP_G_P1 1019

DCE_TCP_RTT_P1 1

RATE_REDUCE_MONITOR_PERIOD_P1 4

INITIAL_ALPHA_VALUE_P1 1023

MIN_TIME_BETWEEN_CNPS_P1 0

CNP_802P_PRIO_P1 6

CNP_DSCP_P1 48

CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)

CLAMP_TGT_RATE_P2 False(0)

RPG_TIME_RESET_P2 300

RPG_BYTE_RESET_P2 32767

RPG_THRESHOLD_P2 1

RPG_MAX_RATE_P2 0

RPG_AI_RATE_P2 5

RPG_HAI_RATE_P2 50

RPG_GD_P2 11

RPG_MIN_DEC_FAC_P2 50

RPG_MIN_RATE_P2 1

RATE_TO_SET_ON_FIRST_CNP_P2 0

DCE_TCP_G_P2 1019

DCE_TCP_RTT_P2 1

RATE_REDUCE_MONITOR_PERIOD_P2 4

INITIAL_ALPHA_VALUE_P2 1023

MIN_TIME_BETWEEN_CNPS_P2 0

CNP_802P_PRIO_P2 6

CNP_DSCP_P2 48

LLDP_NB_DCBX_P1 False(0)

LLDP_NB_RX_MODE_P1 OFF(0)

LLDP_NB_TX_MODE_P1 OFF(0)

LLDP_NB_DCBX_P2 False(0)

LLDP_NB_RX_MODE_P2 OFF(0)

LLDP_NB_TX_MODE_P2 OFF(0)

DCBX_IEEE_P1 True(1)

DCBX_CEE_P1 True(1)

DCBX_WILLING_P1 True(1)

DCBX_IEEE_P2 True(1)

DCBX_CEE_P2 True(1)

DCBX_WILLING_P2 True(1)

KEEP_ETH_LINK_UP_P1 True(1)

KEEP_IB_LINK_UP_P1 False(0)

KEEP_LINK_UP_ON_BOOT_P1 False(0)

KEEP_LINK_UP_ON_STANDBY_P1 False(0)

KEEP_ETH_LINK_UP_P2 True(1)

KEEP_IB_LINK_UP_P2 False(0)

KEEP_LINK_UP_ON_BOOT_P2 False(0)

KEEP_LINK_UP_ON_STANDBY_P2 False(0)

NUM_OF_VL_P1 _4_VLs(3)

NUM_OF_TC_P1 _8_TCs(0)

NUM_OF_PFC_P1 8

NUM_OF_VL_P2 _4_VLs(3)

NUM_OF_TC_P2 _8_TCs(0)

NUM_OF_PFC_P2 8

DUP_MAC_ACTION_P1 LAST_CFG(0)

SRIOV_IB_ROUTING_MODE_P1 False(0)

IB_ROUTING_MODE_P1 LID(1)

DUP_MAC_ACTION_P2 LAST_CFG(0)

SRIOV_IB_ROUTING_MODE_P2 False(0)

IB_ROUTING_MODE_P2 LID(1)

MULTI_PORT_VHCA_EN False(0)

PORT_OWNER True(1)

ALLOW_RD_COUNTERS True(1)

RENEG_ON_CHANGE True(1)

TRACER_ENABLE False(0)

IP_VER IPv4(0)

UEFI_HII_EN True(1)

BOOT_VLAN 1

LEGACY_BOOT_PROTOCOL PXE(1)

BOOT_RETRY_CNT NONE(0)

BOOT_LACP_DIS False(0)

BOOT_VLAN_EN False(0)

BOOT_PKEY 0

UEFI_BOOT_DBG_LOG_TCP DISABLE(0)

UEFI_BOOT_DBG_LOG_TCPIP DISABLE(0)

UEFI_BOOT_DBG_LOG_IP DISABLE(0)

UEFI_BOOT_DBG_LOG_IPV6 DISABLE(0)

UEFI_BOOT_DBG_LOG_DRIVER_SETTINGS DISABLE(0)

UEFI_BOOT_DBG_LOG_STATUS DISABLE(0)

UEFI_BOOT_DBG_LOG_PXE_UNDI DISABLE(0)

ADVANCED_PCI_SETTINGS False(0)

Hello David -

lspci output?

thanks - steve

and ibdev2netdev say?

Hello David -

I hope all is well…

Please try the following :

mlnx_qos -i eth2 -p 0,1,2,3,4,5,6,7

-p --prio_tc

This parameter is used to map priorities to Egress Traffic Class (ETC).

Note: By default, priority 0 is mapped to tc1, and priority 1 is mapped to tc0. All other priorities are mapped to the same TC.

Please let me know you results…

Thanks - steve

Have you looked at: HowTo Configure QoS over SR-IOV https://community.mellanox.com/s/article/howto-configure-qos-over-sr-iov

OR search for mlnx_qos in the community docs?

thanks - steve

Hello David

could you try:

mlnx_qos -i --pfc 0,0,0,1,0,0,0,0

Lossless RoCE Configuration for Linux Drivers in DSCP-Based QoS Mode https://community.mellanox.com/s/article/lossless-roce-configuration-for-linux-drivers-in-dscp-based-qos-mode

thanks - steve

Hello David -

Please ensure you are using the latest MOFED and FW.

http://www.mellanox.com/page/products_dyn?product_family=201&mtag=connectx_4_vpi_card http://www.mellanox.com/page/products_dyn?product_family=201&mtag=connectx_4_vpi_card

http://www.mellanox.com/page/products_dyn?product_family=204&mtag=connectx_4_en_card http://www.mellanox.com/page/products_dyn?product_family=204&mtag=connectx_4_en_card

http://www.mellanox.com/page/products_dyn?product_family=27 http://www.mellanox.com/page/products_dyn?product_family=27

Also please check the following links regarding QoS:

https://community.mellanox.com/s/article/mlnx-qos

https://community.mellanox.com/s/article/howto-configure-pfc-on-connectx-4

https://community.mellanox.com/s/article/how-to-configure-roce-over-a-lossless-fabric--pfc---ecn--end-to-end-using-connectx-4-and-spectrum--trust-l2-x

thanks - steve

Hi David,

Could you shed the light on what you are trying to achieve? Why you are configuring SR-IOV? Are you going to use VLAN? Are you planning to run VM on it? and then RDMA traffic over it? Understanding whole picture of the final result will be extremely helpful.

Hi,

yes I’m going to use VLAN, we have a smaller network of containers and we assign SR-IOV VFs to each of them. At the moment we are using 2 servers connected back-to-back. In the containers we are running bandwidth hungry and latency sensitive applications. By setting QoS rules we would like to reduce the message passing latencies of the latency sensitive traffic. Latency values are increasing due to the presence of the bandwidth hungry traffic. We need to use VLAN because of the value of the PCP header, which differentiates the types of traffic. However running 8 containers per server is not a real world scenario I think.

Note: There was a previous question which seemed to be solved. That was about how we cannot assign priority values to TCs of SR-IOV VFs – furthermore VFs are completely not configurable by the mlnx_qos tool – but according to our measurements the priority settings of the PF are inherited by the VFs, so it works out.

Thanks,

David

Original message from @David Balla​

Original message from: @David Balla​

Original message from: @David Balla​