Help with Nvlink and a pair of 2080 ti?

Using a pair of MSI 2080 ti cards and an RTX NVLINK bridge by ASUS, system running Ubuntu 18.04 Linux. Nvidia-smi invocations run very very slow, caffe and CUDA example programs misbehave.

I am seeing the following output which seems wrong. Should I not see two links (0 and 1) active? Also “nvidia-smi nvlink” should show Link 0 and Link 1. “nvidia-smi topo --matrix” should show NV2 links, not NV1 links. I think there is a hardware problem in the NVLINK connection between the cards.

Is this output strange or is my understanding incorrect? Any help much appreciated!

I am running nvidia-persistenced as a daemon under my user account ID.

$ nvidia-smi nvlink --status # Takes over a minute to finish running.
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-dd1093e0-466f-7322-e214-351b015045d9)
Link 0: 25.781 GB/s
Link 1: <inactive>
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-2a386612-018c-e3fe-3fd4-1dde588af45d)
Link 0: 25.781 GB/s
Link 1: <inactive>

$ nvidia-smi nvlink -c # Takes about two minutes to finish running.
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-dd1093e0-466f-7322-e214-351b015045d9)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-2a386612-018c-e3fe-3fd4-1dde588af45d)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false

$ nvidia-smi nvlink --capabilities # Takes several minutes to finish running.
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-dd1093e0-466f-7322-e214-351b015045d9)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-2a386612-018c-e3fe-3fd4-1dde588af45d)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false

$ nvidia-smi topo --matrix # Takes over a minute to finish running.

           GPU0    GPU1    CPU Affinity

GPU0 X NV1 0-11
GPU1 NV1 X 0-11

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks