Does CUDA+WSL2 work with a GT 710?

I’m trying to use CUDA in WSL2 with a MSI GT710 but it gives me an error with any sample and the GPU is never used. I was expecting to at least nbody benchmark to work.

:~/NVIDIA_CUDA-11.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 710"
  CUDA Driver Version / Runtime Version          11.1 / 11.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Max Clock rate:                            954 MHz (0.95 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.1, CUDA Runtime Version = 11.0, NumDevs = 1
Result = PASS
:~/NVIDIA_CUDA-11.0_Samples/5_Simulations/nbody$ ./nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Kepler" with compute capability 3.5
> Compute 3.5 CUDA device: [GeForce GT 710]
CUDA error at bodysystemcuda_impl.h:159 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&m_deviceData[0].event)"

I’ve read drivers needs to be WDDM2.9 but DxDiag shows WDDM 2.8. Could this be the problem?

DxDiag.txt (88.1 KB)

Hello, Thanks for reaching out !

We just published a new driver that fixes caching issues with some specific systems and it looks from your description that you run into those issues that would prevent the driver to create a context correctly.

Could you retry: https://developer.nvidia.com/cuda/wsl/download (455.41)

Hopefully it should solve your issue.

Let us know how it goes ,

Thanks!

Hi, thanks for the update. Unfortunately nbody keeps showing the same error and DxDiag keeps showing WDDM 2.8.

DxDiag.455.41.txt (88.3 KB)

Windows version VS2019 works:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\bin\win64\Debug>nbody.exe -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Kepler" with compute capability 3.5

> Compute 3.5 CUDA device: [GeForce GT 710]
1024 bodies, total time for 10 iterations: 243.487 ms
= 0.043 billion interactions per second
= 0.861 single-precision GFLOP/s at 20 flops per interaction

Thanks for replying back.

The WDDM version is not an issue: 2.8 is expected at that point.

We will need to look a bit more at this and reach back. Meanwhile could you verify with cuobjdump that there are SASS kernels for your sm version: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html (just to double check you are not hitting the PTX JIT Path which is not yet supported)

Again thank you so much for double checking the driver version.

Hi, this is the output of cuobjdump -sass nbody.10.sm_35.cubin > dump.txt . Is this correct?

dump.txt (765.1 KB)

Thanks,

I don’t see anything obviously wrong that could cause this issue with your setup.

== EDIT == I didn’t release you already posted your dxdiag logs before asking. Sorry about that and thanks a lot ! == EDIT==

We will try to reproduce the issue in-house based on the data in the dxdiag file and see if we can find something.

Thanks for reporting that issue, we will keep you posted,

I have exactly same issue with WSL2 + GT710 + CUDA.
Are there any updates?

We have made a couple of fixes to address issues with these cards in our new drop here: ‘Preview for CUDA on WSL Updated for Performance

Let us know if this new driver fix your issue

Thanks,

1 Like

I just installed Driver 460.15 on Dev build 20206 and now nbody gpu benchmark works in WSL2!

Also gpu is now available in Tensorflow in nvidiaDocker.

I’m gonna test more things but so far everything looks OK.

Thanks a lot!