Link Flapping for 6s on DGX - Have you seen this?

I’m seeing a link flap (see logs for DGX and switch below) in the log on the DGX where the interface goes down for 6s. This same pattern is happening on dozens of nodes. The log below is a standard pattern. Have you seen this before? I’m thinking it may be related to a software issue since if it were cabling or hardware, I would expect some randomness in the logs and downtime, not just 6ms all of the time. Would like to get community thoughts on this issue.

System Information
Manufacturer: NVIDIA
Product Name: DGXA100 920-23687-2531-001
Version: v1.0
Serial Number: xxxxxxxxxxxxxxx
UUID: be9ee235-ff5c-03ca-1000-1565ace0aac0
Wake-up Type: Power Switch
SKU Number: Default string
Family: DGXA100

Specific NIC I’m interested is running

driver: mlx5_core
version: 5.4-3.1.0
firmware-version: 20.32.1010 (MT_0000000225)

Regarding link flapping, I can see clear indication of the event on both the DGX node and the upstream switch. There is a common pattern for the flap, typically 6ms between going down and up again. The DGX node side port goes down/down while the switch side port goes up/down. Here is a typical pattern that I see. DGX reports Link down, then fee switch ports up/down status. In almost every case so far (looking at 7 instances of flapping, this is the order of events). Learn node reports, then upstream fee switch reports:

Log Source Type Date Time Message

DGX-NODE node 1/27/2022 16:38:27 mlx5_core 0000:e1:00.0 enp225s0f0: Link down

DGX-NODE node 1/27/2022 16:38:27 front0: (slave enp225s0f0): speed changed to 0 on port 2

DGX-NODE node 1/27/2022 16:38:27 front0: (slave enp225s0f0): link status definitely down, disabling slave

DGX-NODE node 1/27/2022 16:38:34 mlx5_core 0000:e1:00.0 enp225s0f0: Link up

DGX-NODE node 1/27/2022 16:38:34 front0: (slave enp225s0f0): link status definitely up, 200000 Mbps full duplex

DGX-NODE node 1/27/2022 16:38:34 front0: (slave enp225s0f0): speed changed to 0 on port 2

arista-switch switch 1/27/2022 16:38:49 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet4/4/1, changed state to down

arista-switch switch 1/27/2022 16:38:49 Lag: %LAG-5-MEMBER_REMOVED: Interface Ethernet4/4/1 has left Port-Channel50 due to: partner not in sync

arista-switch switch 1/27/2022 16:38:56 Ebra: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet4/4/1, changed state to up

arista-switch switch 1/27/2022 16:38:56 Lag: %LAG-5-MEMBER_ADDED: Interface Ethernet4/4/1 has joined Port-Channel50

arista-switch switch 1/27/2022 16:38:57 Lldp: %LLDP-5-NEIGHBOR_NEW: LLDP neighbor with chassisId b8ce.f616.ff8a and portId “enp225s0f0” added on interface Ethernet4/4/1