No Peer-To-Peer-Access and slow CUDA execution with NVLink Bridge present on Ubuntu 20.04

Hi,

I wanted to use two RTX2070 Super in an AMD Threadripper 1950x system with NVLink Bridge, but it does not work at all:
./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, GeForce RTX 2070 SUPER, pciBusID: a, pciDeviceID: 0, pciDomainID:0
Device: 1, GeForce RTX 2070 SUPER, pciBusID: 41, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0

nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X SYS 0-31 N/A
GPU1 SYS X 0-31 N/A

Even though I recognized a significantly longer time to execute any application, that utilizes the GPU, when the bridge is installed.

The software environment is a freshly installed Ubuntu 20.04 Server with CUDA Toolkit 11 installed via a network installer from the Nvidia website, Nvidia Driver Version is 450.51.06.

The system consits of the following:
AMD Threadripper 1950x
ASUS Prime X399-A Mainboard
64 GB RAM
2x Gigabyte RTX2070 Super + Gigabyte NVLink Bridge

Any idea how to fix the problem?

Best regards,
Ralf Seidler