NVML Issue on WSL2

Attempting to generate augmentations using Nvidia TAO toolkit in my WSL2 Ubuntu installation and running into this error:
The nvml requested operation is not available on target device
I’m able to use TAO in my Ubuntu installation in WSL2 to train a model, convert it to ONNX or Tensorrt, but I’m not able to generate augmentations using Tao data services.
A strange issue here is that my nvidia-smi output on wsl2 and tao container do not match my windows terminal.
Here is my WSL2/Tao container output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.119                Driver Version: 553.09         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 3500 Ada Gene...    On  |   00000000:01:00.0 Off |                  Off |
| N/A   47C    P3             19W /   91W |       0MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

And here is my windows output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 553.09                 Driver Version: 553.09         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 3500 Ada Gene...  WDDM  |   00000000:01:00.0 Off |                  Off |
| N/A   58C    P3             16W /   85W |      11MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     11040    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
+-----------------------------------------------------------------------------------------+

• Hardware nvidia rtx 3500 ada generation laptop gpu
• Augmentation spec file
aug_spec.txt (935 Bytes)

Could you please set below and retry?
export DALI_DISABLE_NVML=1

This solved the issue. Since I’m using WSL2, I needed to add this into my .tao_mounts.json, so the TAO containers would have visibility to the variable.

    "Envs": [
        {
            "variable": "DALI_DISABLE_NVML",
            "value": "1"
        }
    ],
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.