/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server

balu.kadiyala · July 9, 2024, 8:59pm

Hi, I get below error when running NVIDIA NIM docker container nvcr.io/nim/meta/llama3-8b-instruct:1.0.0

AWS Instance Type
g5.xlarge

NVIDIA GPU-Optimized AMI

uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION=“Ubuntu 22.04.4 LTS”
PRETTY_NAME=“Ubuntu 22.04.4 LTS”
NAME=“Ubuntu”
VERSION_ID=“22.04”
VERSION=“22.04.4 LTS (Jammy Jellyfish)”
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL=“XXX”
SUPPORT_URL=“XXX”
BUG_REPORT_URL=“XXX”
PRIVACY_POLICY_URL=“XXX”
UBUNTU_CODENAME=jammy

lspci | grep -i nvidia
00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1)

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Complete docker run command output:
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v ~/.cache/nim:/opt/nim/.cache -p 8000:8000 nvcr.io/nim/meta/llama3-8b-instruct:1.0.0

===========================================
== NVIDIA Inference Microservice LLM NIM ==

NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama3-8b-instruct

This NIM container is governed by the NVIDIA AI Product Agreement here:
XXXX.
A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the AI Foundation Models Community License
here: XXXX

ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.
A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.

2024-07-09 20:34:26,742 [INFO] PyTorch version 2.2.2 available.
2024-07-09 20:34:27,768 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
2024-07-09 20:34:27,768 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.
2024-07-09 20:34:27,935 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000
INFO 07-09 20:34:29.400 api_server.py:489] NIM LLM API version 1.0.0
INFO 07-09 20:34:29.406 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 07-09 20:34:29.406 ngc_profile.py:219] Detected 2 compatible profile(s).
INFO 07-09 20:34:29.406 ngc_injector.py:106] Valid profile: c334b76d50783655bdf62b8138511456f7b23083553d310268d0d05f254c012b (tensorrt_llm-a10g-fp16-tp1-throughput) on GPUs [0]
INFO 07-09 20:34:29.406 ngc_injector.py:106] Valid profile: 8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d (vllm-fp16-tp1) on GPUs [0]
INFO 07-09 20:34:29.407 ngc_injector.py:141] Selected profile: c334b76d50783655bdf62b8138511456f7b23083553d310268d0d05f254c012b (tensorrt_llm-a10g-fp16-tp1-throughput)
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: gpu_device: 2237:10de
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: profile: throughput
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: tp: 1
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: pp: 1
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: feat_lora: false
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: gpu: A10G
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: llm_engine: tensorrt_llm
INFO 07-09 20:34:30.223 ngc_injector.py:146] Profile metadata: precision: fp16
INFO 07-09 20:34:30.223 ngc_injector.py:166] Preparing model workspace. This step might download additional files to run the model.
INFO 07-09 20:34:33.320 ngc_injector.py:172] Model workspace is now ready. It took 3.097 seconds
INFO 07-09 20:34:33.325 async_trtllm_engine.py:74] Initializing an LLM engine (v1.0.0) with config: model=‘/tmp/meta–llama3-8b-instruct-ba3010gf’, speculative_config=None, tokenizer=‘/tmp/meta–llama3-8b-instruct-ba3010gf’, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend=‘outlines’), seed=0)
WARNING 07-09 20:34:33.683 logging.py:314] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 07-09 20:34:33.699 utils.py:201] Using 0 bytes of gpu memory for PEFT cache
INFO 07-09 20:34:33.700 utils.py:207] Engine size in bytes 16067779716
INFO 07-09 20:34:33.700 utils.py:211] available device memory 23606263808
INFO 07-09 20:34:33.700 utils.py:218] Setting free_gpu_memory_fraction to 0.9
/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server

Topic		Replies	Views
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	547	September 20, 2024
Getting Started With NVIDIA NIM Tutorial Issues with NGC Registry Access/Accounts ubuntu , nim , llm , llama3-8b-instruct	7	1589	July 24, 2024
Batch processing using NVIDIA NIM \| Docker \| Self-hosted Models python , nim , llama3-8b-instruct , llama-31-8b-instruct , llama	11	326	January 29, 2025
NIM nim/meta/llama3-8b-instruct - no API key is detected NGC GPU Cloud	2	809	July 23, 2024
Model says there is a compatible profile but fails on data type Models nim , mistral-7b-instruct-v03	4	685	August 21, 2024
Blueprint RAG v2.0.0 NVIDIA Blueprints nim , llama-31-70b-instruct , llama , blueprints	1	65	April 24, 2025
Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources Models nim , llama-31-8b-instruct , llama	1	215	November 12, 2024
How to fix 0 compatible profiles? Where to get compatible profiles? Models nim , llama-31-8b-instruct , llama	4	517	November 26, 2024
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte Models nim , mistral-7b-instruct-v03	0	102	November 12, 2024
How to fix 0 compatible profiles for L40S with mistral-7b-instruct-v03 NIM? Models gpu , nim , mistral-7b-instruct-v03	7	329	November 4, 2024

/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server

NVIDIA GPU-Optimized AMI

=========================================== == NVIDIA Inference Microservice LLM NIM ==

Related topics

===========================================
== NVIDIA Inference Microservice LLM NIM ==