Video Search Summarization Models fail to download

shinen · March 6, 2025, 1:50pm

I am trying to deploy the Video Search and Summarization blueprint ( local_deployment variant ) for local use.

docker-compose up had errors to even start :

ERROR: Invalid interpolation format for "volumes" option in service "via-server": "${ASSET_STORAGE_DIR:-/dummy}${ASSET_STORAGE_DIR:+:/tmp/assets}"

Since I was attempting to get the blueprint to run, the volume definitions were not critical, so I commented the offending volume definitions.

However, now, while docker-compose runs and creates containers, the via-server is unable to run properly and download the models.

Initially, I had an NGC API Key issue, but then I figured that I needed to insert a valid NGC API Key in .env file. But now it seems that via server does not want to download the models.

container log

via-server_1  | Starting VIA server in release mode
via-server_1  | 2025-03-04 10:19:43,657 INFO Initializing VIA Stream Handler
via-server_1  | 2025-03-04 10:19:43,657 INFO Initializing VLM pipeline
via-server_1  | 2025-03-04 10:19:43,957 INFO Downloading model nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8 ...
Getting files to download...
⠼ ━ • … • Remaining: … • … • Elapsed: 0… • Total: 29 - Completed: 0 - Failed: 29
via-server_1  |       …                  …
via-server_1  |
via-server_1  | --------------------------------------------------------------------------------
via-server_1  |    Download status: FAILED
via-server_1  |    Downloaded local path model: /tmp/tmp5sw8ivy7/vila-1.5-40b_vvila-yi-34b-siglip-stage3_1003_video_v8
via-server_1  |    Total files downloaded: 0
via-server_1  |    Total transferred: 0 B
via-server_1  |    Started at: 2025-03-04 10:19:44
via-server_1  |    Completed at: 2025-03-04 10:19:49
via-server_1  |    Duration taken: 5s
via-server_1  | --------------------------------------------------------------------------------
via-server_1  | 2025-03-04 10:19:49,800 INFO Downloaded model to /root/.via/ngc_model_cache/nim_nvidia_vila-1.5-40b_vila-yi-34b-siglip-stage3_1003_video_v8_vila-llama-
3-8b-lita
via-server_1  | 2025-03-04 10:19:49,801 INFO TRT-LLM Engine not found. Generating engines ...
via-server_1  | Selecting INT4 AWQ mode
via-server_1  | Converting Checkpoint ...
via-server_1  | [2025-03-04 10:19:52,856] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
via-server_1  | df: /root/.triton/autotune: No such file or directory
via-server_1  | [TensorRT-LLM] TensorRT-LLM version: 0.18.0.dev2025020400
via-server_1  | Traceback (most recent call last):
via-server_1  |   File "/opt/nvidia/via/via-engine/models/vila15/trt_helper/quantize.py", line 156, in <module>
via-server_1  |     quantize_and_export(
via-server_1  |   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 669, in quantize_and_export
via-server_1  |     hf_config = get_hf_config(model_dir)
via-server_1  |   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 265, in get_hf_config
via-server_1  |     return AutoConfig.from_pretrained(ckpt_path, trust_remote_code=True)   
via-server_1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1053, in from_pretrained
via-server_1  |     raise ValueError(
via-server_1  | ValueError: Unrecognized model in /tmp/tmp.vila.oa7xbt3I. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, ijepa, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth, intern_vit_6b, v2l_projector, llava_llama, llava_mistral, llava_mixtral
via-server_1  | ERROR: Failed to convert checkpoint
via-server_1  | 2025-03-04 10:19:56,338 ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine
via-server_1  | Traceback (most recent call last):
via-server_1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1211, in run
via-server_1  |     self._stream_handler = ViaStreamHandler(self._args)
via-server_1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 373, in __init__
via-server_1  |     self._vlm_pipeline = VlmPipeline(args.asset_dir, args)
via-server_1  |   File "/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py", line 965, in __init__
via-server_1  |     raise Exception("Failed to generate TRT-LLM engine")
via-server_1  | Exception: Failed to generate TRT-LLM engine
via-server_1  |
via-server_1  | During handling of the above exception, another exception occurred:
via-server_1  |
via-server_1  | Traceback (most recent call last):
via-server_1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 2572, in <module>  
via-server_1  |     server.run()
via-server_1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1213, in run
via-server_1  |     raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")   
via-server_1  | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - Failed to generate TRT-LLM engine
via-server_1  | Killed process with PID 70
local_deployment_via-server_1 exited with code 1

What seems to be the problem here? Am I doing something wrong here?

NB 1 : I also tried the remote_llm_deployment as well as the remote_vlm_deployment and my results were the same.
NB 2 : I could not figure out how to get a NVIDIA_API_KEY ( nvapi-*** ) from build.nvidia.com - required for the remote_llm_deployment and remote_vlm_deployment. Perhaps, that portal has moved?

aryason · March 7, 2025, 1:51am

What version of docker-compose are you running?
Recommended v2.32.4

Based on errors associated with it, you should try to update.

docker compose version

mkdir -p ~/.docker/cli-plugins
curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose
chmod +x ~/.docker/cli-plugins/docker-compose

shinen · March 7, 2025, 2:33pm

I was using the version that came with the package manager of ubuntu 22.04 - v1.29.2 🫣 Thanks for pointing out that the version might be old.

After updating docker-compose to the latest version, the interpolation errors have gone away.

Now, the pending issue is the same as VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine. I will try to use the workarounds suggested there and see if I can get the blueprint to run successfully.

system · March 21, 2025, 2:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	253	April 22, 2025
Error while downloading VIA Visual AI Agent llama	20	283	September 23, 2024
VIA Summarization Workflow ERROR Visual AI Agent llama	34	326	March 5, 2025
Docker instantiation failed when running tao ssd TAO Toolkit	17	929	December 28, 2021
VILA with VIA [New] Visual AI Agent demos-and-tutorials , llama	4	1002	December 24, 2024
Connection Refused to ports 8000 and 9234 while running VSS blueprint Visual AI Agent blueprints	8	85	March 14, 2025
VILA docker issue Visual AI Agent nvbugs , llama	5	136	February 10, 2025
docker.errors.ImageNotFound: 404 Client Error TAO Toolkit	14	3560	February 18, 2022
Error converting Vita-2.0 model checkpoint Visual AI Agent llama	4	141	November 15, 2024
ERROR: Failed to build visual engine Visual AI Agent llama	6	125	October 8, 2024

Video Search Summarization Models fail to download

Related topics