Error while downloading VIA

Hi, i am trying out the VIA on my local machine. However, after few retries on downloading the VITA model, the download kept on failing

Getting files to download...
⠋ ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━ • 7.3/16.6 GiB • Remaining: 0:13:30 • 12.3 MB/s • Elapsed: 0:55:44 • Total: 15 - Completed: 13 - Failed: 2

------------------------------------------------------------
   Download status: FAILED
   Downloaded local path model: /tmp/tmpd6oywkr0/vita_v2.0.1
   Total files downloaded: 13
   Total transferred: 7.34 GB
   Started at: 2024-08-20 04:04:48
   Completed at: 2024-08-20 05:00:32
   Duration taken: 55m 44s
------------------------------------------------------------
2024-08-20 05:00:36,637 INFO Downloaded model to /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita
2024-08-20 05:00:36,639 INFO TRT-LLM Engine not found. Generating engines ...
Selecting FP16 mode
Converting Checkpoint ...
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024043000
0.10.0.dev2024043000
Loading checkpoint shards:   0%|                                                                                                  | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 447, in <module>
    main()
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 439, in main
    convert_and_save_hf(args)
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 356, in convert_and_save_hf
    hf_model = preload_model(model_dir) if not args.load_by_shard else None
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 317, in preload_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3903, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 505, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
FileNotFoundError: No such file or directory: "/root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/model-00001-of-00004.safetensors"
ERROR: Failed to convert checkpoint
2024-08-20 05:00:40,831 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine
Killed process with PID 50

Besides, i am trying the OPENAI API method, but it requires GPT-4o to use it. Do we need to subscription for the GPT-4o to run VIA?

Could you describe your operating steps in detail? Have you run the export NGC_MODEL_CACHE=</SOME/DIR/ON/HOST> command? You need to change the </SOME/DIR/ON/HOST> to a real dir on your host.

Yes.

I am running the VIA using VITA model. However, everytime i run the application it starts downloading the VITA model and fail. Thus, I went to NGC to download and extract the model myself from here using wget

After downloading the model and extract the model into /home/User/Desktop/VIA path, may i know what command should i use to run the docker container. Should i replace the

export MODEL_PATH="ngc:nvidia/tao/vita:2.0.1" 
to
export MODEL_PATH="/home/User/Desktop/VIA" 

No. The MODEL_PATH is the model dir in the container.
You can try to run the command first.

export NGC_MODEL_CACHE=/home/User/Desktop/VIA

Then do docker run with:

-v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache \

After running this,

docker run --rm -it --ipc=host --ulimit memlock=-1 \
       --ulimit stack=67108864 --tmpfs /tmp:exec --name via-server \
       --gpus '"device=all"' \
       -p $FRONTEND_PORT:$FRONTEND_PORT \
       -p $BACKEND_PORT:$BACKEND_PORT \
       -e BACKEND_PORT=$BACKEND_PORT \
       -e FRONTEND_PORT=$FRONTEND_PORT \
       -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
       -e NGC_API_KEY=$NGC_API_KEY \
       -v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache \
       -e MODEL_PATH=$MODEL_PATH \
       -e VLM_MODEL_TO_USE=vita-2.0 \
       -v via-hf-cache:/tmp/huggingface \
       nvcr.io/metropolis/via-dp/via-engine:2.0-dp

It still show getting files to download… Is there something i should configure if i download it using wget method?

I am getting this error after the download fails.

------------------------------------------------------------
   Download status: FAILED
   Downloaded local path model: /tmp/tmp95v9pgqw/vita_v2.0.1
   Total files downloaded: 13
   Total transferred: 7.42 GB
   Started at: 2024-08-21 06:09:01
   Completed at: 2024-08-21 07:11:14
   Duration taken: 1h 2m 12s
------------------------------------------------------------
2024-08-21 07:11:18,176 INFO Downloaded model to /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita
2024-08-21 07:11:18,177 INFO TRT-LLM Engine not found. Generating engines ...
Selecting FP16 mode
Converting Checkpoint ...
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024043000
0.10.0.dev2024043000
Loading checkpoint shards:   0%|                                                                                          | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 447, in <module>
    main()
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 439, in main
    convert_and_save_hf(args)
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 356, in convert_and_save_hf
    hf_model = preload_model(model_dir) if not args.load_by_shard else None
  File "/opt/nvidia/via/via-engine/models/vita20/trt_helper/convert_checkpoint.py", line 317, in preload_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3903, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 505, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
FileNotFoundError: No such file or directory: "/root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/model-00001-of-00004.safetensors"
ERROR: Failed to convert checkpoint
2024-08-21 07:11:22,407 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine

  1. Have you got the NVIDIA API Key and the NGC API Key by referring to our Guide?
  2. If you download the model yourself, could you try to create the nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita dir in your /home/User/Desktop/VIA dir and put the model inside? Then you can try to run that again.

I am able to load the models by making a directory you mentioned. Besides, I already obtain the NGC and API key as mentioned in the documentation.

I would like to ask for VITA model, do we need to pay for a subscription as well to test it out?

2024-08-21 08:55:06,079 ERROR LLM call failed:
Error code: 402 - {'type': 'urn:kaizen:problem-details:payment-required', 'title': 'Payment Required', 'status': 402, 'detail': "Account '2c7slESkQF-uGIymCkSJGl9teCmX3mGzOeIwT2cjvow': Cloud credits expired - Please contact NVIDIA representatives", 'instance': '/v2/nvcf/pexec/functions/a88f115a-4a47-4381-ad62-ca25dc33dc1b'}
2024-08-21 08:55:06,079 PERF Summarization/BatchSummarization time = 595.88 ms
2024-08-21 08:55:06,080 ERROR Summarize failed:
Server didn't respond
2024-08-21 08:55:06,080 INFO Stopping VIA pipeline
2024-08-21 08:55:06,080 ERROR Server didn't respond
2024-08-21 08:55:06,080 ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.

Link for me to get the API KEY for llama NVIDIA NIM | llama3-70b
Link for me to get the NGC key https://org.ngc.nvidia.com/setup/personal-keys & https://org.ngc.nvidia.com/setup/api-key (I have tried both api and personal keys but having the same error)

Besides, the above error mentioned that payment is required, please advise

How long have you been applying for your NVIDIA_API_KEY? It’s possible that this NVIDIA_API_KEY is expired.

The key that i applied is 2 days ago. I have generated a new key again and the same error persists. Please advice.

For your reference, the API_KEY i obtain origins from this link NVIDIA NIM | llama3-70b

I noticed my account has 0 credits left:
image

May i know is this the issue on causing fail API calls? And how can i get more credits if i wanna test the VIA out

We’ll confirm that and let you know when we have a conclusion.

Any updates?

We are still investigating and analyzing this problem. Have you ever run VIA normally before your cloud credits ran out to zero?

Nope, is my first time running it on the early access. Do you need my API key to test? or the API key from your side cannot work as well. Let me know if you need it so i can email it to you.

It would be nice for that. You can just click my icon and message that to me.

Hey Jason,
Would you be able to download and locally deploy the Llama3-8b NIM? There is a section on downloading and self-hosting NIMs in the VIA DP user guide that details the steps to do this (pg. 44). Please let us know if you have any problems.

Hi, i managed to deploy llama3-8b locally. After doing that, could you guide me how to modify this command to use my local container instead?

docker run --rm -it --ipc=host --ulimit memlock=-1 \
 --ulimit stack=67108864 --tmpfs /tmp:exec --name via-server \
 --gpus '"device=all"' \
 -p $FRONTEND_PORT:$FRONTEND_PORT \
 -p $BACKEND_PORT:$BACKEND_PORT \
 -e BACKEND_PORT=$BACKEND_PORT \
 -e FRONTEND_PORT=$FRONTEND_PORT \
 -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
 -e NGC_API_KEY=$NGC_API_KEY \
 -e VLM_MODEL_TO_USE=vita-2.0 \
 -v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache \
 -e MODEL_PATH=$MODEL_PATH \
 -v via-hf-cache:/tmp/huggingface \
 nvcr.io/metropolis/via-dp/via-engine:2.0-dp

The running llama3-8b can work by sending API request:

INFO 08-26 03:35:52.159 httptools_impl.py:481] 172.17.0.1:38692 - "POST /v1/completions HTTP/1.1" 200
INFO 08-26 03:36:00.804 metrics.py:334] Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 6.4 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-26 03:36:10.805 metrics.py:334] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%

As per the Guide , you need to modify the config file of the source code default_config.yaml and config.yml first.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks