Description
When executing the NIM docker container with the following script, if only one gpu is available or a gpu is specified, it runs normally. If multiple gpus are available, it will hang and the gpu usage will be displayed as 100%.
Choose a container name for bookkeeping
export CONTAINER_NAME=llama3-8b-instruct
Choose a LLM NIM Image from NGC
export IMG_NAME=“nvcr.io/nim/meta/${CONTAINER_NAME}:1.0.0”
Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”
Start the LLM NIM
docker run -it --rm --name=$CONTAINER_NAME
–runtime=nvidia
–gpus all
–shm-size=16GB
-e NGC_API_KEY
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache”
-u $(id -u)
-p 8000:8000
$IMG_NAME
Environment
TensorRT Version:
GPU Type: A100
Nvidia Driver Version: 535.161.07
CUDA Version: 12.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):