Quad (4x) A6000 WSL2 CUDA Init Errors

Hey there,

I’m running into some issues with WSL2 on a 4x A6000 Machine. I’ve tried both CUDA 11.7 and 12.0 samples and get “Error: only 0 Devices available, 1 requested. Exiting.” after successfully building and attempting to run the nbody Cuda sample.

$ echo $PATH returns the following successful cuda path:
/usr/local/cuda-12.0/bin

$ echo $CUDA_HOME returns the following:
/usr/local/cuda-12.0/

$ ldconfig -p returns the following list of cuda libraries having been loaded:

Nvidia-SMI returns the following:

…what exactly am I missing from this seemingly appropriate WSL2 setup? I did not install Nvidia display drivers, I am on a brand new WSL2 Ubuntu image running on the latest kernel from Microsoft. I am on Windows 11 22H2.

Drivers as stated in Nvidia-SMI output.

Trying to load Pytorch torch.cuda.is_available() returns the following errors:

  1. “UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error?”
  2. Error 2: out of memory (Triggered internally at /opt/pytorch/pytorch/c10/cuda/“CUDAFunctions.cpp:109”)

Any help here would be wildly appreciated. Thanks!

Update:

If I do a quick:
$export CUDA_VISIBLE_DEVICES=3 (or 1, or 2)

everything works fine. Moving up to the final GPU with $export CUDA_VISIBLE_DEVICES=4 causes a failure.

The same issue happens with WSL2 on my 4x A6000 machine, but to me any option of CUDA_VISIBLE_DEVICES including GPU 1 causes failures while 0, 2, 3 work fine. Still looking for solutions. :(

Thanks, finally make things work out by limited to 3 visible devices. I’m using 4x 2080Tis and also facing the same problem.

Given that this is now mildly repro’ed by 3 people, does any Nvidia team member have thoughts on whether this is a Microsoft / WSL or an Nvidia challenge? Or correctable user error perhaps?