Does CUDA+WSL2 work with a GT 710?

onomatopellan · June 19, 2020, 11:14am

I’m trying to use CUDA in WSL2 with a MSI GT710 but it gives me an error with any sample and the GPU is never used. I was expecting to at least nbody benchmark to work.

:~/NVIDIA_CUDA-11.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 710"
  CUDA Driver Version / Runtime Version          11.1 / 11.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Max Clock rate:                            954 MHz (0.95 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.1, CUDA Runtime Version = 11.0, NumDevs = 1
Result = PASS

:~/NVIDIA_CUDA-11.0_Samples/5_Simulations/nbody$ ./nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Kepler" with compute capability 3.5
> Compute 3.5 CUDA device: [GeForce GT 710]
CUDA error at bodysystemcuda_impl.h:159 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&m_deviceData[0].event)"

I’ve read drivers needs to be WDDM2.9 but DxDiag shows WDDM 2.8. Could this be the problem?

DxDiag.txt (88.1 KB)

rboissel · June 19, 2020, 6:53pm

Hello, Thanks for reaching out !

We just published a new driver that fixes caching issues with some specific systems and it looks from your description that you run into those issues that would prevent the driver to create a context correctly.

Could you retry: https://developer.nvidia.com/cuda/wsl/download (455.41)

Hopefully it should solve your issue.

Let us know how it goes ,

Thanks!

onomatopellan · June 19, 2020, 7:51pm

Hi, thanks for the update. Unfortunately nbody keeps showing the same error and DxDiag keeps showing WDDM 2.8.

DxDiag.455.41.txt (88.3 KB)

Windows version VS2019 works:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.0\bin\win64\Debug>nbody.exe -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Kepler" with compute capability 3.5

> Compute 3.5 CUDA device: [GeForce GT 710]
1024 bodies, total time for 10 iterations: 243.487 ms
= 0.043 billion interactions per second
= 0.861 single-precision GFLOP/s at 20 flops per interaction

rboissel · June 19, 2020, 8:07pm

Thanks for replying back.

The WDDM version is not an issue: 2.8 is expected at that point.

We will need to look a bit more at this and reach back. Meanwhile could you verify with cuobjdump that there are SASS kernels for your sm version: CUDA Binary Utilities :: CUDA Toolkit Documentation (just to double check you are not hitting the PTX JIT Path which is not yet supported)

Again thank you so much for double checking the driver version.

onomatopellan · June 19, 2020, 8:51pm

Hi, this is the output of cuobjdump -sass nbody.10.sm_35.cubin > dump.txt . Is this correct?

dump.txt (765.1 KB)

rboissel · June 20, 2020, 2:58am

Thanks,

I don’t see anything obviously wrong that could cause this issue with your setup.

== EDIT == I didn’t release you already posted your dxdiag logs before asking. Sorry about that and thanks a lot ! == EDIT==

We will try to reproduce the issue in-house based on the data in the dxdiag file and see if we can find something.

Thanks for reporting that issue, we will keep you posted,

nautes.ahn · August 7, 2020, 12:39am

I have exactly same issue with WSL2 + GT710 + CUDA.
Are there any updates?

rboissel · September 3, 2020, 1:01am

We have made a couple of fixes to address issues with these cards in our new drop here: ‘Preview for CUDA on WSL Updated for Performance’

Let us know if this new driver fix your issue

Thanks,

onomatopellan · September 3, 2020, 2:39pm

I just installed Driver 460.15 on Dev build 20206 and now nbody gpu benchmark works in WSL2!

Also gpu is now available in Tensorflow in nvidiaDocker.

I’m gonna test more things but so far everything looks OK.

Thanks a lot!

Topic		Replies	Views
CUDA error, bandwithTest.exe CUDA Setup and Installation	12	2504	January 21, 2019
Unable to run several CUDA samples. CUDA Programming and Performance	2	824	April 1, 2019
CUDA never uses two GPUs CUDA Setup and Installation	2	1704	April 27, 2016
CUDA Toolkit v12.0 failed to compile every sample with .cu files on Windows 10 and Visual Studio 2022 CUDA Setup and Installation cuda , visual-studio , nvcc	2	1575	February 6, 2023
gpu computing sdk 4.0 runtime failures build the sdk succesfully, but the run of any exe failed CUDA Programming and Performance	3	2793	August 8, 2011
one CUDA card unrecognized in 64bit Win7 CUDA Programming and Performance	5	1698	April 15, 2011
cudaGetDeviceCount error 3 (cudaErrorInitializationError) CUDA Programming and Performance	4	3457	March 22, 2021
CUDA : how to set MapSMtoCores and MapSMtoArchName parameters (Windows 10; Quadro P2000 & GeForce RTX 3070)? CUDA Setup and Installation	2	1568	October 13, 2021
I don't understand the execution time (k40c & GTX580). CUDA Programming and Performance	9	2459	April 23, 2015
Simple CUDA program hitting size limits/errors on Windows but not Linux CUDA Programming and Performance	23	1916	January 12, 2019

Does CUDA+WSL2 work with a GT 710?

Related topics