Standard nvidia tests fail using dual RTX 4090 GPUs

Continuing the discussion from Standard nVidia CUDA tests fail with dual RTX 4090 Linux box:

We have the same issue using 2x 4090 with Driver Version: 525.85.12 CUDA Version: 12.0.
First noticed when running distribution test with Tensorflow

Both tests reported above fail in the same way
(see NVIDIA/cuda-samples.git at github…argggh…only 1 link for new users)

  • Samples/0_Introduction/simpleP2P - Test failed!
  • Samples/0_Introduction/simpleIPC - Verification mismatch at 0: 1 != 0…

nvidia-bug-report.log.gz (680.7 KB)

# Machine:
# https://rog.asus.com/nl/motherboards/rog-zenith/rog-zenith-ii-extreme-alpha-model/
$ uname -a
Linux senor0lunlx0163 5.15.0-60-generic #66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal
$ nvidia-smi
Fri Feb 10 16:12:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
|  0%   51C    P8    19W / 450W |    232MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:23:00.0 Off |                  Off |
|  0%   51C    P8    25W / 450W |     10MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2060      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A     12762      G   /usr/lib/xorg/Xorg                 98MiB |
|    0   N/A  N/A     12979      G   /usr/bin/gnome-shell               63MiB |
|    0   N/A  N/A     14647      G   ...106669053894826474,131072       16MiB |
|    1   N/A  N/A      2060      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A     12762      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

cat motherboard.info
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: ROG ZENITH II EXTREME ALPHA
        Version: Rev 1.xx
        Serial Number: 210788043200148
        Asset Tag: Default string
        Features:
                Board is a hosting board
                Board is removable
                Board is replaceable
        Location In Chassis: Default string
        Chassis Handle: 0x0003
        Type: Motherboard
        Contained Object Handles: 0

Handle 0x003C, DMI type 10, 6 bytes
On Board Device Information
        Type: Video
        Status: Enabled
        Description:    To Be Filled By O.E.M.

Handle 0x0042, DMI type 41, 11 bytes
Onboard Device
        Reference Designation:  Onboard IGD
        Type: Video
        Status: Enabled
        Type Instance: 1
        Bus Address: 0000:00:02.0

Handle 0x0043, DMI type 41, 11 bytes
Onboard Device
        Reference Designation:  Onboard LAN
        Type: Ethernet
        Status: Enabled
        Type Instance: 1
        Bus Address: 0000:00:19.0

Handle 0x0044, DMI type 41, 11 bytes
Onboard Device
        Reference Designation:  Onboard 1394
        Type: Other
        Status: Enabled
        Type Instance: 1
        Bus Address: 0000:03:1c.2