Issues while starting NIM container in A10 VM

cOde_RabBIT · August 24, 2024, 7:57pm

when I’m trying to deploy a NIM of llama3-8b I’m facing an issue like

INFO 08-24 19:39:57.168 ngc_injector.py:172] Model workspace is now ready. It took 2.855 seconds
INFO 08-24 19:39:57.173 async_trtllm_engine.py:74] Initializing an LLM engine (v1.0.0) with config: model='/tmp/meta--llama3-8b-instruct-75evn1pg', speculative_config=None, tokenizer='/tmp/meta--llama3-8b-instruct-75evn1pg', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
WARNING 08-24 19:39:57.539 logging.py:314] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 08-24 19:39:57.554 utils.py:201] Using 0 bytes of gpu memory for PEFT cache
INFO 08-24 19:39:57.554 utils.py:207] Engine size in bytes 16067779716
INFO 08-24 19:39:57.554 utils.py:211] available device memory 23606329344
INFO 08-24 19:39:57.554 utils.py:218] Setting free_gpu_memory_fraction to 0.9
/opt/nim/start-server.sh: line 61:    32 Killed                  python3 -m vllm_nvext.entrypoints.openai.api_server

I’m running it on a AWS ec2 g5.large instance with image Deep Learning OSS Nvidia Driver AMI GPU TensorFlow 2.16 (Ubuntu 20.04)

could someone please help me

thank you

gvenkatakris · August 28, 2024, 5:54pm

Hi @cOde_RabBIT ,Thanks for reporting this, we are taking a look to see if we can reproduce it.

gvenkatakris · September 4, 2024, 1:03am

@cOde_RabBIT We were able to reproduce the issue. The current VM instance (g5.xlarge) is running out of memory. Upgrading to a larger g5 instance (g5.2xlarge) resolves the issue.

cOde_RabBIT · September 4, 2024, 3:35am

thanks @gvenkatakris It’s worked :)

system · September 18, 2024, 3:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	738	September 20, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	117	March 28, 2025
NIM - Llama 3 8B Instruct - Results were very weirdn Models nim	1	442	August 27, 2024
Problem with installation of Llama 3.1 8b NIM Models nim , llama3-8b-instruct , llama-31-8b-instruct , llama	1	670	September 4, 2024
NIM - Llama3-8b-Instruct - GPU resource usage is very high Models nim , llama3-8b-instruct	0	91	March 12, 2025
NIM with llama-3-8b models stuck without any error Models nim , llama3-8b-instruct , llama	0	225	November 15, 2024
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	362	November 7, 2024
CUDA fail start. Local NIM Containers run failed CUDA Setup and Installation nim , llama-31-405b-instruct , llama	2	312	September 20, 2024
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	219	November 8, 2024
/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server Container: CUDA	0	338	July 9, 2024

Issues while starting NIM container in A10 VM

Related topics