Please provide the following info (tick the boxes after creating this topic):
Submission Type
Bug or Error
Feature Request
Documentation Issue
Question
Other
Workbench Version
Desktop App v0.50.17
CLI v0.21.3
Other
Host Machine operating system and location
Local Windows 11
Local Windows 10
Local macOS
Local Ubuntu 22.04
Remote Ubuntu 22.04
Other
Summary of the issue
When I start the environment, an error occurs, but when I choose not to use the GPU, the environment starts normally.
Error message
Here is the error log in workbench.log
{“level”:“info”,“time”:“2024-06-06T13:44:10+08:00”,“message”:“project ‘my-project’ is requesting 1 GPUs. Added runtime selection for GPUs.”}
{“level”:“warn”,“runtimeInfoPath”:“/home/rd/.nvwb/project-runtime-info/my-project-e834f8624793c4f0a48dab7b2b5d09801cf98164”,“time”:“2024-06-06T13:44:10+08:00”,“message”:“No git remote operation output files were found.”}
{“level”:“error”,“error”:“input: startProject exit status 125”,“time”:“2024-06-06T13:44:11+08:00”,“message”:“GIN-Graphql request failed”}
{“level”:“info”,“time”:“2024/06/06 - 13:44:11”,“status”:200,“latency”:“261.362116ms”,“client-ip”:“127.0.0.1”,“method”:“POST”,“path”:“/v1/query”,“time”:“2024-06-06T13:44:11+08:00”,“message”:“GIN-Request”}
Screenshots
Hi Leo
sorry for taking so long to get back to you.
Can you please do the following and then send us the logs?
- Open a terminal and activate your local context in debug mode
nvwb --debug activate local
- Open the Project and start the container as you usually do in the UI with the GPU enabled
- Or you can do it in the CLI as follows:
nvwb open <project_name>
nvwb start jupyterlab
This will give us more information on what’s happening.
Thanks
Hi Leo,
I didn’t give you full instructions. Sorry.
In order to set the --debug mode you first need to completely shutdown Workbench on your local machine.
You can do this by fully closing and quitting the Desktop application.
Or if you have an active session in your terminal you can do the following:
nvwb -f --shutdown deactivate
This will force close everything, including running Projects.
Then, activate the local context in debug mode and fire up the container with the GPUs enabled again.
Sorry for missing this.
Hi
Here is my logs. I didn’t set the debug mode correctly, sorry.
workbench.log (1.1 MB)
Thanks!
Hi there,
Your logs are showing a problem engaging the GPU a GeForce 3080 in Ai Workbench. Meanwhile your CUDA runtime and driver are current. Could we run a docker test independent of AI Workbench to help diagnose? Please try nvidia-smi in the docker container environment test and see if the docker environment can recognize the GPU. Please run
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.07-py3 nvidia-smi
and send the results.
If this test fails to see CUDA initialize in the pytorch container, next steps are to troubleshoot libnvidia-container TK Troubleshooting — NVIDIA Container Toolkit 1.15.0 documentation
Hi
Here is my result running the command.
Thanks!
Lets try a fix for this I found helpful. Edit the /etc/nvidia-container-runtime/config.toml using vi or nano as-in
sudo vi /etc/nvidia-container-runtime/config.toml
Modify the line
no-cgroup = true
to
no-cgroup = false
and save the file.
then
sudo systemctl restart docker
and retry the container
docker run --gpus all -it … nvidia-smi
cmd from above.
Thanks for your replying, I reinstalled my system and solved my problem.
Thanks for your help in these days.