Slow performance on WSL2

Please provide complete information as applicable to your setup.

• Hardware Platform GPU
• DeepStream Version 7.0
• NVIDIA GPU Driver Version (valid for GPU only)

NVIDIA-SMI 535.72
Driver Version: 536.45
CUDA Version: 12.2
RTX A4500 Laptop GPU
WSL2

• Question

i am running a sample application on WSL2 machine with the GPU.

However, the FPS I get out of it very, very low… On a native Ubuntu I get 5-10x more FPS than on WSL2.

Before we dive into any code reviews, are there any limitations when I use WSL2?

Is this an expected behaviour?

Best regards
Oleg

Can you measure the GPU loading inside WSL2 with “nvidia-smi dmon” when you run the sample?

Sure:

# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz
    0     28     72      -     25     13      0      0      0      0    405    210
    0     29     72      -     21     11      0      0      0      0    405    210
    0     29     73      -     81     17      0      0      0      0    405    210
    0     27     72      -     46     19      0      0      0      0    405    210
    0     27     72      -     18     15      0     11      0      0    405    210
    0     26     72      -     22     20      0     30      0      0    405    210
    0     29     73      -     57     35     35     24      0      0    405    210
    0     26     73      -      9     17      5      5      0      0    405    210
    0     27     73      -     20     26     14      9      0      0    405    210
    0     27     73      -     33     30      7      8      0      0    405    210
    0     27     73      -     24     26      9      7      0      0    405    210
    0     26     73      -      6     21      7      3      0      0    405    210
    0     26     73      -     31     28      7      7      0      0    405    210
    0     27     73      -     15     22     10      4      0      0    405    210
    0     27     73      -     19     26     13      7      0      0    405    210
    0     27     73      -     20     34     11      8      0      0    405    210
    0     27     73      -     16     31     14      2      0      0    405    210
    0     27     73      -     26     26      8     11      0      0    405    210
    0     27     73      -     25     22     10      3      0      0    405    210
    0     27     73      -     24     20     10      8      0      0    405    210
    0     29     73      -     68     26      9      9      0      0    405    210
    0     27     73      -     36     27      9      5      0      0    405    210
    0     27     73      -     18     21     10      7      0      0    405    210
    0     27     73      -     19     20     11      2      0      0    405    210
    0     28     73      -     24     28     14      6      0      0    405    210
    0     27     73      -     28     33     10      9      0      0    405    210
    0     27     73      -     23     23     13      6      0      0    405    210
    0     27     73      -     23     23     16      9      0      0    405    210
    0     27     73      -     25     25     11     10      0      0    405    210
    0     27     73      -     33     27     19      7      0      0    405    210
    0     27     73      -     27     21     17      4      0      0    405    210
    0     28     73      -     28     26     24      9      0      0    405    210
    0     27     73      -     21     22     20      4      0      0    405    210
    0     27     73      -     17     26     25     13      0      0    405    210
    0     27     73      -     16     27     14      7      0      0    405    210
    0     27     73      -     30     29     27     10      0      0    405    210
    0     27     73      -     23     20     21     13      0      0    405    210
    0     27     73      -     39     28     26     11      0      0    405    210
    0     27     73      -     32     24     22     11      0      0    405    210
    0     27     73      -     17     26     12      9      0      0    405    210
    0     27     73      -     20     25     13      8      0      0    405    210
    0     27     73      -     24     20     16      8      0      0    405    210
    0     27     73      -     26     32     10      0      0      0    405    210
    0     27     73      -     31     28     15     10      0      0    405    210
    0     29     74      -     44     28     15      0      0      0    405    210
    0     29     74      -     44     31     15     12      0      0    405    210
    0     28     73      -      7     20      0      0      0      0    405    210
    0     27     73      -     17     24     11      6      0      0    405    210
    0     27     73      -     22     25     13      5      0      0    405    210
    0     28     73      -     23     28     14      7      0      0    405    210
    0     29     74      -     28     25     13      5      0      0    405    210
    0     28     74      -     30     24     16      4      0      0    405    210

Which sample are you working on?

Have you checked whether your A4500 works in WDDM mode?
CUDA on WSL

I have Windows 11, so based on the docs I do not need to do anything, right?

>wsl cat /proc/version
Linux version 5.15.146.1-microsoft-standard-WSL2 (root@65c757a075e2) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Jan 11 04:09:03 UTC 2024

I have executed deepstream-test3:

root@docker-desktop:/opt/nvidia/deepstream/deepstream-7.0/sources/deepstream_python_apps/apps/deepstream-test3# python3 -m deepstream.app.intro_10 -vv -c deepstream/config.yml

Sample Output:

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.


**PERF:  {'stream0': 20.79}


Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Frame Number= 1152 Number of Objects= 19 Vehicle_count= 16 Person_count= 3
Frame Number= 1153 Number of Objects= 22 Vehicle_count= 19 Person_count= 3

**PERF:  {'stream0': 13.6}


Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.
Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

Warning: gst-core-error-quark: A lot of buffers are being dropped. (13): ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstEglGlesSink:nvvideo-renderer:
There may be a timestamping problem, or this computer is too slow.

**PERF:  {'stream0': 19.2}

nvidia-smi dmon output:

# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz
    0     23     70      -      4     10      0      0      0      0    405    210
    0     24     70      -      3     10      0      0      0      0    405    210
    0     24     71      -     21     46      0      0      0      0    405    210
    0     27     71      -     38     17      0      0      0      0    405    210
    0     25     71      -     13     27      0      0      0      0    405    210
    0     23     70      -      3     35      0      0      0      0    405    210
    0     22     70      -      0     54      0      0      0      0    405    210
    0     22     70      -      0     35      0      0      0      0    405    210
    0     22     70      -      0     33      0      0      0      0    405    210
    0     23     70      -     24     20      0      0      0      0    405    210
    0     28     71      -     24     10      0      0      0      0    405    210
    0     28     71      -     21     11      0      0      0      0    405    210
    0     28     71      -     16     10      0      0      0      0    405    210
    0     28     71      -     20     11      0      0      0      0    405    210
    0     28     71      -     23     11      0      0      0      0    405    210
    0     28     71      -     15     11      0      0      0      0    405    210
    0     28     71      -     16     10      0      0      0      0    405    210
    0     28     71      -     18     11      0      0      0      0    405    210
    0     28     71      -     38     23      0      0      0      0    405    210
    0     29     71      -     94     49      0     16      0      0    405    210
    0     30     71      -     36     18      0      0      0      0    405    210
    0     30     71      -     88     45      0      4      0      0    405    210
    0     31     71      -     93     58      0      6      0      0    405    210
    0     31     71      -     93     64      0     11      0      0    405    210
    0     31     71      -     91     62      0      9      0      0    405    210
    0     31     71      -     90     60      0      5      0      0    405    210
    0     31     71      -     94     64      0      6      0      0    405    210
    0     31     72      -     93     64      0      2      0      0    405    210
    0     32     71      -     98     69      0      7      0      0    405    210
    0     32     72      -     97     67      0      6      0      0    405    210
    0     32     72      -     84     53      0      3      0      0    405    210
    0     31     72      -     91     58      0     15      0      0    405    210
    0     31     72      -     96     62      0     11      0      0    405    210
    0     31     72      -     89     57      0      5      0      0    405    210
    0     31     72      -     94     60      0      6      0      0    405    210
    0     32     72      -     92     59      0      7      0      0    405    210
    0     32     72      -     95     62      0     13      0      0    405    210
    0     32     72      -     91     54      0     13      0      0    405    210
    0     32     72      -     95     62      0      6      0      0    405    210
    0     32     72      -     98     67      0     12      0      0    405    210
    0     32     72      -     88     59      0     10      0      0    405    210
    0     31     72      -     94     63      0      6      0      0    405    210
    0     31     72      -     94     64      0     10      0      0    405    210
    0     31     72      -     90     61      0      5      0      0    405    210
    0     32     72      -     92     59      0      8      0      0    405    210
    0     31     72      -     92     60      0      9      0      0    405    210
    0     32     72      -     95     66      0      7      0      0    405    210
    0     31     72      -     98     69      0      2      0      0    405    210
    0     32     72      -     94     64      0     10      0      0    405    210
    0     32     72      -     93     59      0     10      0      0    405    210
    0     32     72      -     94     61      0      4      0      0    405    210
    0     32     72      -     95     67      0      6      0      0    405    210
    0     32     72      -     96     65      0      5      0      0    405    210
    0     32     72      -     92     64      0     11      0      0    405    210
    0     32     72      -     91     59      0     15      0      0    405    210
    0     32     72      -     94     63      0      6      0      0    405    210
    0     32     73      -     93     60      0     15      0      0    405    210
    0     32     72      -     94     65      0      7      0      0    405    210
    0     32     73      -     96     65      0      6      0      0    405    210
    0     32     73      -     92     58      0     15      0      0    405    210
    0     32     73      -     94     62      0      7      0      0    405    210
    0     32     73      -     94     60      0      6      0      0    405    210
    0     32     73      -     97     65      0      0      0      0    405    210

This is not deepstream-test3 sample

I’ve run the deepstream-test3 in WSL2, the performance looks well.

Why is this is not sample 3? This is the original code which is in the Docker container…

So what is the problem with Python Application?

This is the original python deepstream-test3 sample deepstream_python_apps/apps/deepstream-test3 at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com)

This is the same code… No difference…

So what can be the issue?

I don’t find any issue with deepstream-test3 in WSL2.

There is no “-v” option with deepstream-test3. Can you upload the “deepstream/config.yml” configuration?

@Fiona.Chen

I’ve also observed slower performance on wsl2 as compared to native ubuntu 22.04, while testing with deepstream 7.0.

The environment setup for both wsl2 and ubuntu is as follows:

Hardware Platform: GPU (NVIDIA GeForce RTX 3050)
DeepStream Version 7.0
**Driver Version: 550.54.15 **
CUDA Version: 12.4
Docker: deepstream:7.0-triton-multiarch

Used the following link to set up and run deepstream on wsl2 (Windows 11).

Additionally, this testing was done on the same machine, so difference in specifications should not be a factor for performance difference.

I used deepstream python test 3 (with minor time calculation modification), peopleNet(int8 precision) as pgie and tested on two 1080p videos of 1 min and 30 min duration respectively. The log files attached show the results for each platform.

test3_1min_1080p_wsl2.txt (2.6 KB)
test3_30min_1080p_wsl2.txt (3.7 KB)
test3_30min_1080p_ubuntu22.txt (3.5 KB)
test3_1min_1080p_ubuntu22.txt (2.7 KB)

As you can see, the performance on ubuntu 22.04 is approximately 1.4 times better/faster than on wsl 2 in terms of processing/inference time.

The performance may be a little worse in WSL2. This is a known issue. The DeepStream on WSL feature is alpha release quality now. We may improve the performance in the future.

Hi Fiona, thanks again for releasing WSL2 version for us to test. May I ask if there is any way to run DS without Docker on WSL2?

We never tried that. You may try to install DeepStream on WSL2 according to Installation — DeepStream documentation 6.4 documentation

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.