I’m currently working on integrating a custom VLM model (Neva-22B https://ai.api.nvidia.com/v1/vlm/nvidia/neva-22b) into the NVIDIA VIA Video and Summary Agent framework. During inference, I encountered the following error when calling the /summarize endpoint:
2025-07-08 07:12:03,409 ERROR Traceback (most recent call last):
File "/opt/nvidia/via/via-engine/vlm_pipeline/process_base.py", line 180, in __process_int
result = self._process(**kwargs)
File "/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py", line 760, in _process
vlm_response_stats = ctx.ask(
File "/opt/nvidia/via/via-engine/models/custom/custom_model.py", line 94, in ask
return self._model.generate(
File "/opt/nvidia/via/via-engine/models/custom/custom_model.py", line 69, in generate
result = self._inference.generate(prompt, tensor, configs)
File "/home/hoangnt66/VSS-Agent/src/vss-engine/src/models/custom/demo/neva/inference.py", line 27, in generate
assert len(input) == 1
AssertionError
2025-07-08 07:12:03,414 ERROR Encountered error while processing chunk Chunk 0: start=0.0 end=44.4 file=/tmp/assets/7106084b-6ec1-442d-8ff5-f3ce656e7411/accident.mp4 of query b3cc2b39-0efe-4833-8063-64c440a73ea5 - An unknown error occurred
Environment
.env configuration:
VLM_MODEL_TO_USE=custom
MODEL_PATH=/home/hoangnt/VSS-Agent/src/vss-engine/src/models/custom/demo/neva/
MODEL_ROOT_DIR=/home/hoangnt/VSS-Agent/src/vss-engine/src/models/
cURL command used:
curl -X POST http://localhost:8100/files \
-H "Content-Type: multipart/form-data" \
-F "purpose=vision" \
-F "media_type=video" \
-F "file=@example/accident.mp4"
curl -X POST http://localhost:8100/summarize \
-H "Content-Type: application/json" \
-d '{
"id": "7106084b-6ec1-442d-8ff5-f3ce656e7411",
"prompt": "You are an intelligent traffic system. You must monitor and note all events related to traffic accidents. Start and end each sentence with a timestamp.",
"model": "neva",
"api_type": "internal",
...
"vlm_input_width": 256,
"vlm_input_height": 256,
...
}'
Problem
The assertion assert len(input) == 1 fails inside inference.py, which suggests that the input to the custom Neva model is either malformed or the input batch contains multiple items unexpectedly. I’m not sure whether this is an issue with the chunking logic, preprocessing pipeline, or the way VIA passes input tensors to the model.