I am facing an issue when trying to start a container in a remote environment using NVIDIA AI Workbench after building it. Used:
PyTorch
A Pytorch 2.1 Base with CUDA 12.2v1.0.2 | Ubuntu 22.04 | Python3
Upon launching the environment, I receive the following error message:
No GPUs Available
Not enough GPU resources are available. You can continue without GPUs. Additionally, you can cancel to manually stop projects to free up GPU resources.
This error occurs despite having GPUs available on the server. The output from the nvidia-smi command confirms that both GPUs (NVIDIA GeForce RTX 3060 and GeForce RTX 3060 Ti) are detected and show minimal memory usage:
Interestingly, when I run the container directly from the terminal, it successfully recognizes the GPUs. This indicates that there may be an issue with how the GPU resources are allocated or recognized specifically within NVIDIA AI Workbench.
I would appreciate any assistance in troubleshooting this issue to ensure the GPUs are available for the container in the AI Workbench environment.
Please tick the appropriate box to help us categorize your post
[0] Bug or Error
Feature Request
Documentation Issue
Other logs.txt (7.9 KB)
Hi,
I am having the exact same problem. I have the GPU on a remote ubuntu environment and am accessing it from a workbench environment on my laptop with ubuntu.
I have installed the nvidia container toolkit so doesn’t seem like a docker issue. Some environment info below.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Off | 00000000:17:00.0 Off | N/A |
| 0% 28C P8 5W / 370W | 10MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2988 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
I am facing the same issue but with a different configuration. I have installed Nvidia AI Workbench on my local Macbook Pro M3. I am trying to run the example RAG repo and I am using docker. I am getting the error as attached in the screenshot.
Hi, it looks like you are working locally on an ARM-based Mac machine.
If you are working locally on the mac, then please note that you do need a dedicated GPU to run this project. Your currently system does not have a dedicated GPU.
Do you have a remote Ubuntu box you have access to with a GPU? If so you can connect to it from your Mac and use that location for compute for this project. You can read more about how to do so here.
(Also please note that ARM-based macs are generally unsupported by HF TGI (base container image for this project). But it seems like the project built fine for you, so maybe it is ok)
I used desktop docker’s mac socket and configured my workbench to point to this file which allowed me to build the project. I am using NGC to remotely access a GPU but I am not sure if my NGC is configured correctly.