Standard nVidia CUDA tests fail with dual RTX 4090 Linux box

We can not confirm that the RTX 6000 Ada GPUs have this problem on AMD EPYC or WRX80 based CPUs. P2P copy has not to be disabled when using RTX 6000 Ada GPUs.

More important: the transfered data is correct. Multi RTX 6000 Ada setups seem to work without problems.

Some findings for the multi RTX 4090 setups:

  • When disable P2P copy with NCCL_P2P_DISABLE on AMD EPYC/WRX80 the locking problem can be by-passed, but then the transfered data between the GPUs is not copied correct! (destination data is all 0 or all NaN). This can be tested with for example:
  • The multi GPU RTX 4090 problem is not specific to AMD CPUs on the Intel CPUs we tested (for example XEON Silver 4309Y) the transfer is not blocked (NCCL_P2P_DISABLE has no effect) but the data is also not copied correct (destination all 0 or NaN)! This is independed of if NCCL_P2P_DISABLE is set or not (which of course should have no effect, as above example uses directly CUDA and not the higher level NCCL library).

The RTX 4090 is currently not useable for multi GPU usage, neither on Intel nor AMD. The reason from our analysis seems to be a broken? CUDA UVA implementation.

4 Likes