Saving chunks and their summaries and metadata in VSS

VSS chunks and related summaries ,metadata saving.

Trying to to check if the anyone has tried / figured out how to save chunks along with their metadata and summary.

Have you raised the same issue on GitHub?

What deployment method do you use? Helm or Docker Compose? Which VLM/LLM do you use?

If you are deploying using docker compose local-deployment. You can find the embedding file at /tmp/assets.

Such as:

/tmp/assets/
├── 40549d1d-edfb-4f39-a839-c10570612496
│   ├── embeddings
│   │   └── 3212225393188_3242225393188
│   │       ├── chunk_info.json
│   │       ├── embeddings.safetensors
│   │       └── video_frames_times.json
│   └── info.json

Please refer to this link

Yup , raised the same issue on Github also.
For now, I’m just using the default Docker Compose file for single gpu local deployment, not able to replicate the folder structure you mentioned above.

ill try again once more , and reach out if i need anymore info , thanks.

Is it possible I might have missed some variables in the compose file to enable chunk and its info saving ?

Let’s discuss this issue in the forum and I will close the github issue

In the link to the post above, there is the following code. That data is only generated when using the vila-1.5 model.

self._have_emb_gen = False
if args.vlm_model_type != VlmModelType.NVILA:
    self._have_emb_gen = True

Trying to deploy local-deployment configuration. By the way, why do you want to obtain this embedding information? It’s not readable.

I was just checking what information is stored when using live-streaming specifically, the chunk timestamps, their corresponding summaries, and the specific frames associated with each chunk.

2025-09-24 12:04:01,811 INFO Using model cached at /root/.via/ngc_model_cache/nim_nvidia_vila-1.5-40b_vila-yi-34b-siglip-stage3_1003_video_v8_vila-llama-3-8b-lita
via-server3-1 | 2025-09-24 12:04:01,811 INFO TRT-LLM Engine not found. Generating engines …
via-server3-1 | Selecting INT4 AWQ mode
via-server3-1 | Converting Checkpoint …
via-server3-1 | [2025-09-24 12:04:05,036] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
via-server3-1 | df: /root/.triton/autotune: No such file or directory
via-server3-1 | 2025-09-24 12:04:07,081 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
via-server3-1 | [TensorRT-LLM] TensorRT-LLM version: 0.18.0.dev2025022500
via-server3-1 | Traceback (most recent call last):
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py”, line 685, in _get_config_dict
via-server3-1 | config_dict = cls._dict_from_json_file(resolved_config_file)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py”, line 794, in _dict_from_json_file
via-server3-1 | return json.loads(text)
via-server3-1 | ^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/lib/python3.12/json/init.py”, line 346, in loads
via-server3-1 | return _default_decoder.decode(s)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/lib/python3.12/json/decoder.py”, line 337, in decode
via-server3-1 | obj, end = self.raw_decode(s, idx=_w(s, 0).end())
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/lib/python3.12/json/decoder.py”, line 355, in raw_decode
via-server3-1 | raise JSONDecodeError(“Expecting value”, s, err.value) from None
via-server3-1 | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
via-server3-1 |
via-server3-1 | During handling of the above exception, another exception occurred:
via-server3-1 |
via-server3-1 | Traceback (most recent call last):
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/trt_helper/quantize.py”, line 167, in
via-server3-1 | quantize_and_export(
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py”, line 690, in quantize_and_export
via-server3-1 | model = get_model(model_dir, dtype, device=device, device_map=device_map)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py”, line 317, in get_model
via-server3-1 | model = _get_vila_model(ckpt_path)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py”, line 260, in _get_vila_model
via-server3-1 | model = AutoModel.from_pretrained(
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py”, line 564, in from_pretrained
via-server3-1 | return model_class.from_pretrained(
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/VILA/llava/model/language_model/llava_llama.py”, line 61, in from_pretrained
via-server3-1 | return cls.load_pretrained(
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/VILA/llava/model/llava_arch.py”, line 127, in load_pretrained
via-server3-1 | vlm = cls(config, *args, **kwargs)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/VILA/llava/model/language_model/llava_llama.py”, line 43, in init
via-server3-1 | return self.init_vlm(config=config, *args, **kwargs)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/VILA/llava/model/llava_arch.py”, line 76, in init_vlm
via-server3-1 | self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/models/vila15/VILA/llava/model/language_model/builder.py”, line 65, in build_llm_and_tokenizer
via-server3-1 | llm_cfg = AutoConfig.from_pretrained(model_name_or_path)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py”, line 1054, in from_pretrained
via-server3-1 | config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py”, line 591, in get_config_dict
via-server3-1 | config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py”, line 689, in _get_config_dict
via-server3-1 | raise EnvironmentError(
via-server3-1 | OSError: It looks like the config file at ‘/tmp/tmp.vila.yPJ8uXSd/llm/config.json’ is not a valid JSON file.
via-server3-1 | ERROR: Failed to convert checkpoint
via-server3-1 | 2025-09-24 12:04:08,246 ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine
via-server3-1 | Traceback (most recent call last):
via-server3-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 1370, in run
via-server3-1 | self._stream_handler = ViaStreamHandler(self._args)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/via_stream_handler.py”, line 503, in init
via-server3-1 | self._vlm_pipeline = VlmPipeline(args.asset_dir, args)
via-server3-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server3-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py”, line 1325, in init
via-server3-1 | raise Exception(“Failed to generate TRT-LLM engine”)
via-server3-1 | Exception: Failed to generate TRT-LLM engine
via-server3-1 |
via-server3-1 | During handling of the above exception, another exception occurred:
via-server3-1 |
via-server3-1 | Traceback (most recent call last):
via-server3-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 2889, in
via-server3-1 | server.run()
via-server3-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 1372, in run
via-server3-1 | raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")
via-server3-1 | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - Failed to generate TRT-LLM engine
via-server3-1 | Killed process with PID 134
via-server3-1 exited with code 1

—————————————–
Facing this when i use local_deployment and vila-1.5
same issue in local_deployment_single_gpu also.
kindly provide any more documentation if available for using vila-1.5 (like trt_engine_path and related envs and their path to be set in the compose file), and cosmos if available.Thanks.

running 4* L40s gpus, | NVIDIA-SMI 565.57.01 ,Driver Version: 565.57.01 ,CUDA Version: 12.7 |

This problem is caused by insufficient GPU memory. Vila-1.5 takes up more GPU memory than nvila, and 4*L40s may not meet the usage conditions.

For nvila you can make some modifications to the above code to save chunk_info.json

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.