Standard nVidia CUDA tests fail with dual RTX 4090 Linux box

kinred · February 20, 2023, 9:02am

We can not confirm that the RTX 6000 Ada GPUs have this problem on AMD EPYC or WRX80 based CPUs. P2P copy has not to be disabled when using RTX 6000 Ada GPUs.

More important: the transfered data is correct. Multi RTX 6000 Ada setups seem to work without problems.

Some findings for the multi RTX 4090 setups:

When disable P2P copy with NCCL_P2P_DISABLE on AMD EPYC/WRX80 the locking problem can be by-passed, but then the transfered data between the GPUs is not copied correct! (destination data is all 0 or all NaN). This can be tested with for example:

github.com

ndd314/cuda_examples/blob/master/0_Simple/simpleP2P/simpleP2P.cu

/*
 * Copyright 1993-2013 NVIDIA Corporation.  All rights reserved.
 *
 * Please refer to the NVIDIA end user license agreement (EULA) associated
 * with this source code for terms and conditions that govern your use of
 * this software. Any use, reproduction, disclosure, or distribution of
 * this software and related documentation outside the terms of the EULA
 * is strictly prohibited.
 *
 */

/*
 * This sample demonstrates a combination of Peer-to-Peer (P2P) and
 * Unified Virtual Address Space (UVA) features new to SDK 4.0
 */

// includes, system
#include <stdlib.h>
#include <stdio.h>

This file has been truncated. show original

The multi GPU RTX 4090 problem is not specific to AMD CPUs on the Intel CPUs we tested (for example XEON Silver 4309Y) the transfer is not blocked (NCCL_P2P_DISABLE has no effect) but the data is also not copied correct (destination all 0 or NaN)! This is independed of if NCCL_P2P_DISABLE is set or not (which of course should have no effect, as above example uses directly CUDA and not the higher level NCCL library).

The RTX 4090 is currently not useable for multi GPU usage, neither on Intel nor AMD. The reason from our analysis seems to be a broken? CUDA UVA implementation.

Topic		Replies	Views
Parallel training with 4 cards 4090 cannot be performed on AMD 5975WX， stuck at the beginning CUDA Programming and Performance	14	5965	February 20, 2023
P2P not working for P600s? CUDA Programming and Performance	7	1816	April 5, 2018
P2P issue using two RTX 5090 GPUs CUDA Programming and Performance cuda , kernel , ubuntu , linux-driver	12	1420	March 16, 2025
NVidia driver 520.61.05 / Cuda 11.8 / RTX 3090 = black display and superslow modesets Linux cuda , ubuntu	21	24810	December 6, 2022
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	986	October 9, 2024
Using GTX 590 cards for CUDA SLI cards under CUDA? CUDA Programming and Performance	37	14242	April 2, 2012
P2P access not enabled, is this a software or a hardware issue? CUDA Setup and Installation	7	9703	November 10, 2015
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64453	April 20, 2011
Keep getting "GPU has fallen off the bus" with 3090 cards on Gigabyte MZ32-AR1 Rev 3.0 motherboard Linux gaming	18	330	June 10, 2025
GPU Utilization Drops after Consecutive Executions CUDA Programming and Performance	28	5728	October 2, 2013

Standard nVidia CUDA tests fail with dual RTX 4090 Linux box

Related topics