Error converting Vita-2.0 model checkpoint

I’m using VIA Engine to host Vita-2.0 model on A100 VM. Please find below the VM specifications:

  • RAM: 70GB
  • NVIDIA Driver Version: 535.161.08
  • VRAM: 40GB

I’m using nvcr.io/metropolis/via-dp/via-engine:2.0-dp image to deploy the model.

I ran this container using

export BACKEND_PORT=8000
export FRONTEND_PORT=9000
export NVIDIA_API_KEY=<My-NVIDIA-API-Key>
export NGC_API_KEY=<My-NGC-Key>
export MODEL_PATH=./via-vita-model/vita_2.0.1
export NGC_MODEL_CACHE=./via-vita-model/cache
export VLM_MODEL_TO_USE=vita-2.0

docker run -it --ipc=host --ulimit memlock=-1 \
       --ulimit stack=67108864 --tmpfs /tmp:exec --name via-server-2 \
       --gpus '"device=all"' \
       -p $FRONTEND_PORT:$FRONTEND_PORT \
       -p $BACKEND_PORT:$BACKEND_PORT \
       -e BACKEND_PORT=$BACKEND_PORT \
       -e FRONTEND_PORT=$FRONTEND_PORT \
       -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
       -e NGC_API_KEY=$NGC_API_KEY \
       -e VLM_BATCH_SIZE=1 \
       -v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache \
       -v ./via-vita-model/vita_2.0.1:/root/via-vita-model/vita_2.0.1 \
       -e MODEL_PATH=$MODEL_PATH \
       -e VLM_MODEL_TO_USE=vita-2.0 \
       -v via-hf-cache:/tmp/huggingface \
       nvcr.io/metropolis/via-dp/via-engine:2.0-dp

With this command, I was getting NVML error:

[10/09/2024-05:12:34] [TRT-LLM] [I] Compute capability: (8, 0)
[10/09/2024-05:12:34] [TRT-LLM] [I] SM count: 108
Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 427, in main
    cluster_config = infer_cluster_config()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 538, in infer_cluster_config
    cluster_info=infer_cluster_info(),
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 460, in infer_cluster_info
    sm_clock = pynvml.nvmlDeviceGetMaxClockInfo(
  File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 2182, in nvmlDeviceGetMaxClockInfo
    _nvmlCheckReturn(ret)
  File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
ERROR: Failed to build TRT engine
2024-10-09 05:12:36,731 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine

To fix this issue, I referred to this github issue and upgraded the tesnorrt-llm version to 0.12.0.dev2024070200 but I’m now getting below mentioned error:

Selecting FP16 mode
Converting Checkpoint ...
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024070200
Traceback (most recent call last):
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 14, in <module>
    from tensorrt_llm.models.llama.weight import load_from_gptq_llama
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'
ERROR: Failed to convert checkpoint
2024-10-09 05:16:23,658 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine
Killed

Note: I downloaded the vita-2.0 model from NGC

Suggest the possible solution to fix this issue.

We will investigate this issue internally and give feedback as soon as we get a conclusion.

Hi @apoorv.mishra01 , currently, we do not support the VM environment. Could you just try that on the A100 host?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.