No GPUs Available

Problem Summary:

I am facing an issue when trying to start a container in a remote environment using NVIDIA AI Workbench after building it. Used:
PyTorch
A Pytorch 2.1 Base with CUDA 12.2v1.0.2 | Ubuntu 22.04 | Python3
Upon launching the environment, I receive the following error message:

No GPUs Available
Not enough GPU resources are available. You can continue without GPUs. Additionally, you can cancel to manually stop projects to free up GPU resources.

This error occurs despite having GPUs available on the server. The output from the nvidia-smi command confirms that both GPUs (NVIDIA GeForce RTX 3060 and GeForce RTX 3060 Ti) are detected and show minimal memory usage:

| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 0% 39C P8 18W / 170W | 2MiB / 12288MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 3060 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 59C P8 21W / 200W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

Interestingly, when I run the container directly from the terminal, it successfully recognizes the GPUs. This indicates that there may be an issue with how the GPU resources are allocated or recognized specifically within NVIDIA AI Workbench.

I would appreciate any assistance in troubleshooting this issue to ensure the GPUs are available for the container in the AI Workbench environment.

Please tick the appropriate box to help us categorize your post
[0] Bug or Error
Feature Request
Documentation Issue
Other
logs.txt (7.9 KB)

Is this on a virtual desktop?

it is nvidia AI Workbench installed on Windows 11

The screenshot doesn’t let me see which system that Workbench window is for. Is it for local or for a remote?

Regardless, let me echo back what I think you are doing.

  • You have Workbench installed locally on a Windows 11 system.
  • You have installed Workbench remotely on an Ubuntu system that has the two GPUs in question.
  • When you try to build and open a CUDA enabled project on the remote, you are getting no GPUs found even though there are GPUs.

Is this assessment correct?

Thanks for your support!!!

yes. I installed workbench on remote, installed and connected.(Screenshot from remote location)
image

I can modify files, launch jupyter(CPU) etc. on remote and nvidia workbench program.

using nvidia-smi in remote location, you can see the output(screenshot from remote).
image

About app Nvidia Worbanch. I installed program and connected to remote(screenshot from nvidia workbench program on windows11)


Q & A:

  • You have Workbench installed locally on a Windows 11 system. Yes
  • You have installed Workbench remotely on an Ubuntu system that has the two GPUs in question. Yes
  • When you try to build and open a CUDA enabled project on the remote, you are getting no GPUs found even though there are GPUs. Yes

In general, my project that needs GPU won’t run in workbench

Error using remote location:


ok. so i’m guessing this is a problem at the dependency level, like something with Docker or the drivers.

Are you using Docker or podman as the runtime on the remote?

Hi,
I am having the exact same problem. I have the GPU on a remote ubuntu environment and am accessing it from a workbench environment on my laptop with ubuntu.
I have installed the nvidia container toolkit so doesn’t seem like a docker issue. Some environment info below.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        Off |   00000000:17:00.0 Off |                  N/A |
|  0%   28C    P8              5W /  370W |      10MiB /  10240MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2988      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+