I am trying to run the h2o-gpt chatbot on my computer, but I have trouble using the NVIDIA graphics card. The error message I get is “Auto-detected mode as ‘legacy’”, which indicates that the NVIDIA container runtime is not able to communicate with the graphics card. I guess it is likely because the NVIDIA drivers are not installed or configured correctly. But still, I can use nvidia-smi. Here is the error message:
(base) user@user-16GB-computer:~/dev/project/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ sudo docker compose up
[sudo] password for user:
Attaching to h2ogpt
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
It seems that I can’t deal with nvidia services:
(base) user@user-16GB-computer:~/dev/project/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ sudo systemctl start nvidia-container-runtime
Failed to start nvidia-container-runtime.service: Unit nvidia-container-runtime.service not found.
But the pilot seems to work:
(base) user@user-16GB-computer:~/dev/project/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ nvidia-smi
Mon Jan 15 18:29:04 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3080 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 43C P0 N/A / 125W | 8MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2440 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Here is part of my docker-compose.yaml:
version: '3'
services:
h2ogpt:
image: gcr.io/vorvan/h2oai/h2ogpt-runtime:latest
container_name: h2ogpt
shm_size: '2gb'
environment:
- ANONYMIZED_TELEMETRY=False
- HF_DATASETS_OFFLINE=1
- TRANSFORMERS_OFFLINE=1
volumes:
- $HOME/.cache:/workspace/.cache
- ./data/models:/workspace/models:ro
- ./data/save:/workspace/save
- ./data/user_path:/workspace/user_path
- ./data/db_dir_UserData:/workspace/db_dir_UserData
- ./data/users:/workspace/users
- ./data/db_nonusers:/workspace/db_nonusers
- ./data/llamacpp_path:/workspace/llamacpp_path
- ./data/h2ogpt_auth:/workspace/h2ogpt_auth
ports:
- 7860:7860
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command: >
/workspace/generate.py
--base_model=mistralai/Mistral-7B-Instruct-v0.2
--hf_embedding_model=intfloat/multilingual-e5-large
--load_4bit=True
--use_flash_attention_2=True
--score_model=None
--top_k_docs=10
--max_input_tokens=2048
--visible_h2ogpt_logo=False
--dark=True
--visible_tos_tab=True
--langchain_modes="['UserData', 'LLM']"
--langchain_mode_paths="{'UserData':'/workspace/user_path/sample_docs'}"
--langchain_mode_types="{'UserData':'shared'}"
--enable_pdf_doctr=off
--enable_captions=False
--enable_llava=False
--use_unstructured=False
--enable_doctr=False
--enable_transcriptions=False
--enable_heap_analytics=False
--use_auth_token=hf_XXXX
--prompt_type=mistral
--pre_prompt_query="Use the following pieces of informations to answer, don't try to make up an answer, just say I don't know if you don't know."
--prompt_query="Cite relevant passages from context to justify your answer."
--use_safetensors=False --verbose=True
networks:
- h2ogpt-net
I don’t know if it is related but right now I find my computer very slow. I had read something about the GEForce bringing a bunch of modules running in the background, which served no purpose and slowed down the machine.
My etc/docker/daemon.json diodn’t looked good:
ubuntu@ubuntu-GE66-Raider-11UH:~/dev/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
I modified it so path is /etc/docker/daemon.json and ran the command again:
ubuntu@ubuntu-GE66-Raider-11UH:~/dev/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
So I tried the 3rd solution, downgrading my nvidia driver, but got the runtime hook missing:
ubuntu@ubuntu-GE66-Raider-11UH:~/dev/chatbot-rag/v2_h2ogpt/h2ogpt-docker$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: exec: "nvidia-container-runtime-hook": executable file not found in $PATH.