Low GPU usage in TRTIS

I’m trying to run an Nvidia Triton Inference Server on my brand new Asus G14 laptop with RTX2060-Q but unfortunately it doesn’t work well. I suspect the driver has a bottleneck since the GPU usage peeks at 2%.

Detailed problem analysis:

Of course TRTIS team won’t be able to do much if its a driver issue.

This is a long thread you sent out the link for with lots of debugging and hacking done for the settings.
Would you mind sending a shorter version of the steps that needed to be followed if we wanted to reproduce the issue in house?

Also, I’d like to point out that currently NVML is not yet supported. We are working to add the support though.

This is the summary:

  1. Install Windows 10 build 2004 and Enable WSL2
  2. Install Nvidia WSL2 driver following the official guide.
  3. Install Ubuntu WSL2 distro.
  4. Install Triton on Ubuntu from Docker repository.
  5. Try to run Triton, it will fail because it will not detect the GPU driver, because the WSL2 driver doesn’t appear in the usual paths where Triton checks.
  6. Try to fix Trtiton by adjusting paths.
  7. It will run but only use 2% of the GPU.

Disregard the nvml error, that’s not what I’m after.

@ kmorozov any news on this?

Hello !

Assuming your workload was suffering from the small workload perf issue we had on the previous driver you should try the new driver ‘Preview for CUDA on WSL Updated for Performance’.
That driver actually has a couple of new optimization to strongly boost the perf of apps that were bottle necked by the small workload launch overhead.

This is far from the last set of optimization we plan to do so performance should keep increasing. But there is a good chance that driver might make your GPU utilization better now.

Let us know how it goes !

Thank you for the wonderful news, I will check it out.

Dear @rboissel

Triton does indeed now start correctly on the new driver.

Unfortunately my laptop has a shared memory between system and GPU. In my current setup, I run docker for development and now I need to run another docker (in WSL) for Triton. Running two dockers exhausts the memory of 16GB in my Asus G14 with 2060 Max-Q.

Looks like I will need to wait until the Driver supports the Docker Desktop WSL 2 backend, so I can use a single Docker instance.

Any ideas when this might happen?

cc @kmorozov