NCCL example fails on WSL2 and 1 or 2 A5500's

Below is the NCCL failure when running on 1 A5500 - same failure with 2 A5500’s on WSL2. Any ideas how to fix this? Same error when I try to use NCCL on an LLM.

./build/all_reduce_perf -b 8 -e 128M -f 2 -g 1

nThread 1 nGpus 1 minBytes 8 maxBytes 134217728 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0

Using devices

Rank 0 Group 0 Pid 402242 on thanos device 0 [0x51] NVIDIA RTX A5500

thanos:402242:402242 [0] NCCL INFO Bootstrap : Using eth0:172.23.106.2<0>
thanos:402242:402242 [0] NCCL INFO cudaDriverVersion 12060
thanos:402242:402242 [0] NCCL INFO NCCL version 2.22.3+cuda12.6
thanos:402242:402251 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
thanos:402242:402251 [0] NCCL INFO NET/IB : No device found.
thanos:402242:402251 [0] NCCL INFO NET/Socket : Using [0]eth0:172.23.106.2<0>
thanos:402242:402251 [0] NCCL INFO Using network Socket
thanos:402242:402251 [0] NCCL INFO ncclCommInitRank comm 0x55d0567abb30 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId 51000 commId 0x51cb079389fbb9d7 - Init START
thanos:402242:402251 [0] NCCL INFO comm 0x55d0567abb30 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
thanos:402242:402251 [0] NCCL INFO Channel 00/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 01/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 02/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 03/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 04/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 05/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 06/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 07/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 08/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 09/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 10/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 11/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 12/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 13/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 14/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 15/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 16/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 17/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 18/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 19/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 20/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 21/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 22/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 23/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 24/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 25/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 26/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 27/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 28/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 29/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 30/32 : 0
thanos:402242:402251 [0] NCCL INFO Channel 31/32 : 0
thanos:402242:402251 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
thanos:402242:402251 [0] NCCL INFO P2P Chunksize set to 131072

thanos:402242:402251 [0] include/alloc.h:123 NCCL WARN Cuda failure 999 ‘unknown error’
thanos:402242:402251 [0] NCCL INFO include/alloc.h:215 → 1
thanos:402242:402251 [0] NCCL INFO channel.cc:42 → 1
thanos:402242:402251 [0] NCCL INFO init.cc:544 → 1
thanos:402242:402251 [0] NCCL INFO init.cc:1156 → 1
thanos:402242:402251 [0] NCCL INFO init.cc:1408 → 1
thanos:402242:402251 [0] NCCL INFO group.cc:70 → 1 [Async thread]
thanos:402242:402242 [0] NCCL INFO group.cc:420 → 1
thanos:402242:402242 [0] NCCL INFO group.cc:546 → 1
thanos:402242:402242 [0] NCCL INFO group.cc:101 → 1
thanos:402242:402242 [0] NCCL INFO init.cc:1761 → 1
thanos: Test NCCL failure common.cu:1005 'unhandled cuda error (run with NCCL_DEBUG=INFO for details) / ’
… thanos pid 402242: Test failure common.cu:891

can you please enable the Debug logs by setting NCCL_DEBUG=INFO and share teh logs with us?

@AakankshaS That is what is done - NCCL_DEBUG=INFO is turned on and the output is displayed above:

NCCL_DEBUG=INFO ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 1

@AakankshaS - any updates?