Low GPU usage in TRTIS

Turowicz · June 29, 2020, 2:14pm

I’m trying to run an Nvidia Triton Inference Server on my brand new Asus G14 laptop with RTX2060-Q but unfortunately it doesn’t work well. I suspect the driver has a bottleneck since the GPU usage peeks at 2%.

Detailed problem analysis:

github.com/triton-inference-server/server

WSL2 + CUDA

opened 08:23AM - 25 Jun 20 UTC

closed 05:14PM - 04 Sep 20 UTC

turowicz

**Description** I'm trying to run Triton on the recently released WSL2 and a sp…ecific CUDA driver for it. The examples provided by NVIDIA work fine. Details: https://docs.nvidia.com/cuda/wsl-user-guide/index.html **Triton Information** What version of Triton are you using? nvcr.io/nvidia/tritonserver:20.03-py3 Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:20.03-py3 **To Reproduce** `sudo docker run -d --restart always --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name triton -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/tritonserver:20.03-py3` OR `sudo nvidia-docker run -d --restart always --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name triton -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/tritonserver:20.03-py3` ``` ============================= == Triton Inference Server == ============================= NVIDIA Release 20.03 (build 11042949) Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use 'nvidia-docker run' to start this container; see https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .. ``` **Expected behavior** I expect Triton to start up?

Of course TRTIS team won’t be able to do much if its a driver issue.

kmorozov · July 8, 2020, 9:10pm

This is a long thread you sent out the link for with lots of debugging and hacking done for the settings.
Would you mind sending a shorter version of the steps that needed to be followed if we wanted to reproduce the issue in house?

Also, I’d like to point out that currently NVML is not yet supported. We are working to add the support though.

Turowicz · July 9, 2020, 7:55am

This is the summary:

Install Windows 10 build 2004 and Enable WSL2
Install Nvidia WSL2 driver following the official guide.
Install Ubuntu WSL2 distro.
Install Triton on Ubuntu from Docker repository.
Try to run Triton, it will fail because it will not detect the GPU driver, because the WSL2 driver doesn’t appear in the usual paths where Triton checks.
Try to fix Trtiton by adjusting paths.
It will run but only use 2% of the GPU.

Disregard the nvml error, that’s not what I’m after.

Turowicz · August 28, 2020, 2:38pm

@ kmorozov any news on this?

rboissel · September 3, 2020, 1:05am

Hello !

Assuming your workload was suffering from the small workload perf issue we had on the previous driver you should try the new driver ‘Preview for CUDA on WSL Updated for Performance’.
That driver actually has a couple of new optimization to strongly boost the perf of apps that were bottle necked by the small workload launch overhead.

This is far from the last set of optimization we plan to do so performance should keep increasing. But there is a good chance that driver might make your GPU utilization better now.

Let us know how it goes !

Turowicz · September 3, 2020, 7:35am

Thank you for the wonderful news, I will check it out.

Turowicz · September 3, 2020, 9:29am

Dear @rboissel

Triton does indeed now start correctly on the new driver.

Unfortunately my laptop has a shared memory between system and GPU. In my current setup, I run docker for development and now I need to run another docker (in WSL) for Triton. Running two dockers exhausts the memory of 16GB in my Asus G14 with 2060 Max-Q.

Looks like I will need to wait until the Driver supports the Docker Desktop WSL 2 backend, so I can use a single Docker instance.

Any ideas when this might happen?

cc @kmorozov

Cheers

Topic		Replies	Views
Setup Triton Inference Server on a Windows 2019 server with Tesla GPU + inference using python Frameworks (archived) inference-server-triton , windows-driver-solutions	0	806	December 6, 2022
Tensor Core Usage on WSL2 with RTX 3080 Laptop GPU CUDA on Windows Subsystem for Linux	1	2234	February 27, 2022
Triton Server can't run with GPU TAO Toolkit inference-server-triton	20	3750	September 18, 2023
WSL2 backed docker containers can't see GPU's CUDA on Windows Subsystem for Linux docker	1	1848	June 4, 2024
Hiccups setting up WSL2 + CUDA CUDA on Windows Subsystem for Linux	19	10514	October 12, 2021
Problem detection GPU in WSL2 CUDA on Windows Subsystem for Linux wsl	0	1063	April 4, 2021
Docker Failure: RTX 3090 WSL2 Windows Insider Dev CUDA on Windows Subsystem for Linux	2	1581	October 12, 2021
Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown CUDA on Windows Subsystem for Linux	35	38546	August 21, 2023
Tesla GPU on Windows Server 2022 not detected in WSL and Docker containers CUDA on Windows Subsystem for Linux docker , wsl , data-center	4	567	June 6, 2025
RTX8000 utilisation at 25%, with frequent hangs CUDA on Windows Subsystem for Linux	0	427	July 29, 2020

Low GPU usage in TRTIS

Related topics