Please provide the following information when requesting support.
Hardware - GPU (/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
Hi there,
this is my first time here, I am trying to setup AI Chatbots with RAG - Docker Workflow. been thru the requirements and prerequisites and followed the guide AI Chatbot - Docker workflow | NVIDIA NGC
when i deploy the
USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms
and then
cd rag-app-text-chatbot-llamaindex/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
i get a error
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py”, line 149, in execute_method
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] return executor(*args, **kwargs)
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py”, line 100, in init_device
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py”, line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] raise ValueError(
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
so add them to the docker file
nemollm-inference:
container_name: nemollm-inference-microservice
image: nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
volumes:
- ${MODEL_DIRECTORY}:/opt/nim/.cache
user: “${USERID}”
ports:
- “8000:8000”
expose:
- “8000”
environment:
NGC_API_KEY: ${NGC_API_KEY}
MODEL_DTYPE: half # Add this to force Float16
NIM_PRECISION: fp16 # Additional safeguard
but doesnt work have tired all options and stuck. would be grateful for a work around pls.