Can't start environment with GPU

Please provide the following info (tick the boxes after creating this topic):

Submission Type
Bug or Error
Feature Request
Documentation Issue
Question
Other

Workbench Version
Desktop App v0.50.17
CLI v0.21.3
Other

Host Machine operating system and location
Local Windows 11
Local Windows 10
Local macOS
Local Ubuntu 22.04
Remote Ubuntu 22.04
Other

Summary of the issue
When I start the environment, an error occurs, but when I choose not to use the GPU, the environment starts normally.

Error message
Here is the error log in workbench.log

{“level”:“info”,“time”:“2024-06-06T13:44:10+08:00”,“message”:“project ‘my-project’ is requesting 1 GPUs. Added runtime selection for GPUs.”}
{“level”:“warn”,“runtimeInfoPath”:“/home/rd/.nvwb/project-runtime-info/my-project-e834f8624793c4f0a48dab7b2b5d09801cf98164”,“time”:“2024-06-06T13:44:10+08:00”,“message”:“No git remote operation output files were found.”}
{“level”:“error”,“error”:“input: startProject exit status 125”,“time”:“2024-06-06T13:44:11+08:00”,“message”:“GIN-Graphql request failed”}
{“level”:“info”,“time”:“2024/06/06 - 13:44:11”,“status”:200,“latency”:“261.362116ms”,“client-ip”:“127.0.0.1”,“method”:“POST”,“path”:“/v1/query”,“time”:“2024-06-06T13:44:11+08:00”,“message”:“GIN-Request”}

Screenshots



Hi Leo

sorry for taking so long to get back to you.

Can you please do the following and then send us the logs?

  • Open a terminal and activate your local context in debug mode
    • nvwb --debug activate local
  • Open the Project and start the container as you usually do in the UI with the GPU enabled
    • Or you can do it in the CLI as follows:
      • nvwb open <project_name>
      • nvwb start jupyterlab

This will give us more information on what’s happening.

Thanks

Hi

Here is my logs.

workbench.log (18.7 KB)

Thanks!

Hi Leo,

I didn’t give you full instructions. Sorry.

In order to set the --debug mode you first need to completely shutdown Workbench on your local machine.

You can do this by fully closing and quitting the Desktop application.

Or if you have an active session in your terminal you can do the following:

  • nvwb -f --shutdown deactivate

This will force close everything, including running Projects.

Then, activate the local context in debug mode and fire up the container with the GPUs enabled again.

Sorry for missing this.

Hi

Here is my logs. I didn’t set the debug mode correctly, sorry.

workbench.log (1.1 MB)

Thanks!

Hi there,
Your logs are showing a problem engaging the GPU a GeForce 3080 in Ai Workbench. Meanwhile your CUDA runtime and driver are current. Could we run a docker test independent of AI Workbench to help diagnose? Please try nvidia-smi in the docker container environment test and see if the docker environment can recognize the GPU. Please run

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.07-py3 nvidia-smi

and send the results.

If this test fails to see CUDA initialize in the pytorch container, next steps are to troubleshoot libnvidia-container TK Troubleshooting — NVIDIA Container Toolkit 1.15.0 documentation

Hi
Here is my result running the command.

Thanks!

Lets try a fix for this I found helpful. Edit the /etc/nvidia-container-runtime/config.toml using vi or nano as-in

sudo vi /etc/nvidia-container-runtime/config.toml

Modify the line

no-cgroup = true

to

no-cgroup = false

and save the file.

then

sudo systemctl restart docker

and retry the container

docker run --gpus all -it … nvidia-smi

cmd from above.

Thanks for your replying, I reinstalled my system and solved my problem.

Thanks for your help in these days.