Llama-3.1-Nemotron-70B-Instruct An error occurred in MPI_Init_thread

JadeLu · February 19, 2025, 3:21am

Download Llama-3.1-Nemotron-70B-Instructat from NIM
Using the following environment in nvidia_entrypoint , still getting the MPI error, what I missed?

LD_LIBRARY_PATH=/opt/hpcx/ucc/lib/ucc:/opt/hpcx/ucc/lib:/opt/hpcx/ucx/lib/ucx:/opt/hpcx/ucx/lib:/opt/hpcx/ompi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/nim/llm/.venv/lib/python3.10/site-packages/tensorrt_llm/libs:/opt/nim/llm/.venv/lib/python3.10/site-packages/nvidia/cublas/lib:/opt/nim/llm/.venv/lib/python3.10/site-packages/tensorrt_libs:/opt/nim/llm/.venv/lib/python3.10/site-packages/nvidia/nccl/lib

PATH=/opt/nim/llm/.venv/bin:/opt/hpcx/ucc/bin:/opt/hpcx/ucx/bin:/opt/hpcx/ompi/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin

OPAL_PREFIX=/opt/hpcx/ompi

HPCX_HOME=/opt/hpcx

which mpirun
/opt/hpcx/ompi/bin/mpirun

sophwats · February 19, 2025, 4:58pm

Hi @JadeLu the team is taking a look into this, and we will get back to you when we have a better idea of what’s happening. Thanks for your patience.

Topic		Replies	Views
Nemollm-inference-microservice failed to deploy Models nim , llama3-8b-instruct , llama	1	156	October 22, 2024
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	113	November 8, 2024
RuntimeError: Failed to dlopen libcuda.so.1 \|\| Running Llama 3.3 70B Models nim , llama	1	99	February 17, 2025
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	39	March 28, 2025
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	206	November 7, 2024
Unable to access the NIM page for 3.2 11b on build.nvidia.com Access/Accounts nim , llama	5	24	June 6, 2025
/opt/nim/start-server.sh: line 61: 32 Killed python3 -m vllm_nvext.entrypoints.openai.api_server Container: CUDA	0	264	July 9, 2024
Llama-3.1-70b-instruct Models llama-31-70b-instruct , llama	4	276	December 2, 2024
API connect Models nim , llama-31-8b-instruct , llama	1	122	September 20, 2024
Unable to Run NIM on H100 GPU Due to Profile Compatibility Issue Despite Sufficient GPU Resources Models nim , llama-31-8b-instruct , llama	1	201	November 12, 2024

Llama-3.1-Nemotron-70B-Instruct An error occurred in MPI_Init_thread

Related topics