AI Chatbot - Docker workflow Guide issue Container nemollm-inference-microservice V100 32GB X8

zack20 · May 1, 2025, 2:29pm

Please provide the following information when requesting support.

Hardware - GPU (/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Hi there,

this is my first time here, I am trying to setup AI Chatbots with RAG - Docker Workflow. been thru the requirements and prerequisites and followed the guide AI Chatbot - Docker workflow | NVIDIA NGC

when i deploy the
USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms

and then
cd rag-app-text-chatbot-llamaindex/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

i get a error
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py”, line 149, in execute_method
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] return executor(*args, **kwargs)
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py”, line 100, in init_device
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] File “/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py”, line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] raise ValueError(
(RayWorkerWrapper pid=3259) ERROR 05-01 14:07:19 worker_base.py:157] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.

so add them to the docker file

nemollm-inference:
container_name: nemollm-inference-microservice
image: nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
volumes:
- ${MODEL_DIRECTORY}:/opt/nim/.cache
user: “${USERID}”
ports:
- “8000:8000”
expose:
- “8000”
environment:
NGC_API_KEY: ${NGC_API_KEY}
MODEL_DTYPE: half # Add this to force Float16
NIM_PRECISION: fp16 # Additional safeguard
but doesnt work have tired all options and stuck. would be grateful for a work around pls.

Topic		Replies	Views
Model says there is a compatible profile but fails on data type Models nim , mistral-7b-instruct-v03	4	629	August 21, 2024
Ussue with VIA and VITA-2.0 - Error Code 402 Visual AI Agent	9	363	November 1, 2024
Problem with installation of Llama 3.1 8b NIM Models nim , llama3-8b-instruct , llama-31-8b-instruct , llama	1	530	September 4, 2024
/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server Container: CUDA	0	254	July 9, 2024
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	471	September 20, 2024
[SUPPORT] Workbench Example Project: Hybrid RAG NVIDIA AI Workbench workbench-example-project	93	1960	April 7, 2025
How to pass this --dtype=half at the runtime of container? i know my server gpu compatibility is 7.5 but i would like to use half at run time Models	1	265	January 3, 2025
401 unauthorized access Visual AI Agent nim , llama-31-70b-instruct , llama	11	39	April 28, 2025
Jetson Container `Nano_llm` version 24.6-r36.2.0 error on Jepack 6.0 DP Jetson Orin NX containers , generative_ai	5	244	July 4, 2024
Chat with RTX Model Version Problem While Starting AI Foundation Models and Endpoints	8	2968	March 25, 2024

AI Chatbot - Docker workflow Guide issue Container nemollm-inference-microservice V100 32GB X8

Related topics