Does BlueFiled-2 support IBV_ATOMIC_GLOB?

I have written a small code to query the device’s capabilities and check rdma atomic capability. I have run the code in both the host and DPU. And from the result, it looks like both the host and DPU only support IBV_ATOMIC_HCA. Is this actually the case for BF2 DPU? or am I missing something? Is there any way to turn on GLOBAL atomic mode?

I am running the DPU in Embedded CPU mode.

ibv_devinfo for the device I am checking capability on the DPU side. 2 ports are 2 SFs:

hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         24.35.1012
        node_guid:                      b8ce:f603:00d2:132a
        sys_image_guid:                 b8ce:f603:00d2:1326
        vendor_id:                      0x02c9
        vendor_part_id:                 41686
        hw_ver:                         0x1
        board_id:                       MT_0000000560
        phys_port_cnt:                  255
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

Update:

I came across this pull request. From this commit, it looks like IBV_ATOMIC_HCA is OK to provide global atomicity as long as the device supports PCIe atomics. But when I am querying pcie atomic capabilities with ibv_query_attr_ex() I am getting attr_ex.pcie_atomic_caps.fetch_add = 0 and attr_ex.pcie_atomic_caps.compare_swap = 0.

I have checked device configuration parameter PCI_ATOMIC_MODE. It was set to PCI_ATOMICS_DISABLED_EXT_ATOMICS_ENABLED(0). I have tried changing it to PCI_ATOMICS_ENABLED_EXT_ATOMICS_ENABLED(4) but it is giving same results. Does that mean BF2 doesn’t support pcie atomics?

CPU I am using: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz

hi Ash

BF2 should support pci atomic.

How to check if PCIe atomic is capable or not.

You need to check 3 components:

1.mlxconfig -d ca:00.0 q | grep PCI_ATOMIC_MODE

If this is PCI_ATOMICS_ENABLED_EXT_ATOMICS_ENABLED_SERIALIZED(1), PCI atomic is capable for the device

2.You need AtomicOp Requester enabled to be set on the NIC PCI device

for example, this enabled: lspci -vvv -s ca:00.0

AtomicOpsCtl: ReqEn+ : AtomicOp Requester Enabled

3.If NIC connected via PCI bridge, on the bridge AtomicOp completer should be enabled

You need to check PCI tree, if connected via PCI bridge, and if yes, check PCI address of the bridge and in “lspci -vvvxxx -s ” - check “AtomicOp completer” configuration.

If all 3 are enabled, PCI atomic capability enabled. If at least 1 is disabled, PCI atomic capability disabled.

If no PCI bridge and NIC is connected directly to PCI slot on the server, 1 and 2 is enough.

Thank you
Meng, Shi

Hi Meng,

Thank you for the reply.

Initially, the device configuration parameter PCI_ATOMIC_MODE was set to PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0). Then I changed it to PCI_ATOMICS_ENABLED_EXT_ATOMICS_ENABLED_SERIALIZED(1). Rebooted the machine and the DPU. Still I am facing same issue.

Here is the PCI tree focused to DPU:

 +-[0000:16]-+-00.0  Intel Corporation Ice Lake Memory Map/VT-d
 |           +-00.1  Intel Corporation Ice Lake Mesh 2 PCIe
 |           +-00.2  Intel Corporation Ice Lake RAS
 |           +-00.4  Intel Corporation Ice Lake IEH
 |           \-02.0-[17]--+-00.0  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
 |                        +-00.1  Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
 |                        \-00.2  Mellanox Technologies MT42822 BlueField-2 SoC Management Interface

So, it looks like BF2 is connected through pcie bridge 0000:16:02.0. Here is lspci -vvv -s 0000:16:02.0 [Printing out full output just to make sure I am not missing anything]

16:02.0 PCI bridge: Intel Corporation Device 347a (rev 04) (prog-if 00 [Normal decode])
        DeviceName: SLOT 2
        Physical Slot: 2
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 133
        NUMA node: 0
        Region 0: Memory at 207ffe000000 (64-bit, non-prefetchable) [size=128K]
        Bus: primary=16, secondary=17, subordinate=17, sec-latency=0
        I/O behind bridge: 0000f000-00000fff [disabled]
        Memory behind bridge: 9b800000-9b8fffff [size=1M]
        Prefetchable memory behind bridge: 0000207ffa000000-0000207ffdffffff [size=64M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 512 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #2, PowerLimit 75.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd Off, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCap: CRSVisible+
                RootCtl: ErrCorrectable- ErrNon-Fatal+ ErrFatal+ PMEIntEna+ CRSVisible+
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, NROPrPrP+, LTR-
                         10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, LN System CLS Not Supported, TPHComp+, ExtTPHComp-, ARIFwd+
                         AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd+
                         AtomicOpsCtl: ReqEn+ EgressBlck-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [88] Subsystem: Intel Corporation Device 0000
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00018  Data: 0000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol+
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap+
                HeaderLog: 4a000002 17020008 fc300048 00000000
                RootCmd: CERptEn- NFERptEn- FERptEn-
                RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                         FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
        Capabilities: [148 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [180 v1] Vendor Specific Information: ID=0003 Rev=0 Len=00a <?>
        Capabilities: [190 v1] Downstream Port Containment
                DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
                DpcCtl: Trigger:0 Cmpl- INT- ErrCor- PoisonedTLP- SwTrigger- DL_ActiveErr-
                DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO ErrPtr:10
                Source: 0000
        Capabilities: [200 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [400 v1] Data Link Feature <?>
        Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [450 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: pcieport

I couldn’t find anything like “AtomicOp completer”. But following capabilities are there in the context of AtomicOp:

AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+
AtomicOpsCtl: ReqEn+ EgressBlck-

Does that mean the PCIe bridge doesn’t support AtomicOp completer?

Here is lspic output for the BF2 device after setting PCI_ATOMIC_MODE to PCI_ATOMICS_ENABLED_EXT_ATOMICS_ENABLED_SERIALIZED(1)

lspci -vvv -s 0000:17:0.0:

17:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
        Subsystem: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 18
        NUMA node: 0
        Region 0: Memory at 207ffc000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+, NROPrPrP-, LTR-
                         10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn+
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [48] Vital Product Data
                Product Name: BlueField-2 DPU 100GbE Dual-Port QSFP56, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket
                Read-only fields:
                        [PN] Part number: MBF2M516A-CENOT
                        [EC] Engineering changes: B2
                        [V2] Vendor specific: MBF2M516A-CENOT
                        [SN] Serial number: MT2125X06705
                        [V3] Vendor specific: aae27201e8d3eb118000b8cef6d21326
                        [VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=BF2M516A
                        [V0] Vendor specific: PCIeGen4 x16
                        [VU] Vendor specific: MT2125X06705MLNXS0D0F0
                        [RV] Reserved: checksum good, 1 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [320 v1] Lane Margining at the Receiver <?>
        Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [420 v1] Data Link Feature <?>
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

In the context of PCIe AtomicOp following capabilities can be noted:

AtomicOpsCap: 32bit- 64bit- 128bitCAS-
AtomicOpsCtl: ReqEn+

Let me know what you think. Thank you.

hi ash

you bridge is support AtomicOp completer, but you didn’t enable it:
detail see:Search · AtomicOp · GitHub

Thank you
Meng, Shi

Hello Meng,

I have looked at the code you have shared. I am Not sure what you mean by I have to enable it. Here is the lspci output of the bridge from my previous post:

AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+
AtomicOpsCtl: ReqEn+ EgressBlck-

From this output it looks like on the bridge (1) AtomicOp Routing is enabled (2) 32, 64 and 128 bit AtomicOp completer is enabled.

Do I need to enable any additional configuration on the bridge?

And here the output from the BF2 device:

AtomicOpsCap: 32bit- 64bit- 128bitCAS-
AtomicOpsCtl: ReqEn+

AtomicOp Requester is enabled and AtomicOp completers are disabled. Do I need to enable atomic op completer on the BF2 device cap too?

Thanks,
Ashfaq

hi Ash

As seems we need some time to sync the information.
I suggest you contact networking-support@nvidia.com for further debug.

Thank you
Meng, Shi

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.