AssertionError during inference with custom Neva-22B VLM model on VIA Video & Summary Agent

I’m currently working on integrating a custom VLM model (Neva-22B https://ai.api.nvidia.com/v1/vlm/nvidia/neva-22b) into the NVIDIA VIA Video and Summary Agent framework. During inference, I encountered the following error when calling the /summarize endpoint:

2025-07-08 07:12:03,409 ERROR Traceback (most recent call last):
  File "/opt/nvidia/via/via-engine/vlm_pipeline/process_base.py", line 180, in __process_int
    result = self._process(**kwargs)
  File "/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py", line 760, in _process
    vlm_response_stats = ctx.ask(
  File "/opt/nvidia/via/via-engine/models/custom/custom_model.py", line 94, in ask
    return self._model.generate(
  File "/opt/nvidia/via/via-engine/models/custom/custom_model.py", line 69, in generate
    result = self._inference.generate(prompt, tensor, configs)
  File "/home/hoangnt66/VSS-Agent/src/vss-engine/src/models/custom/demo/neva/inference.py", line 27, in generate
    assert len(input) == 1
AssertionError

2025-07-08 07:12:03,414 ERROR Encountered error while processing chunk Chunk 0: start=0.0 end=44.4 file=/tmp/assets/7106084b-6ec1-442d-8ff5-f3ce656e7411/accident.mp4 of query b3cc2b39-0efe-4833-8063-64c440a73ea5 - An unknown error occurred

Environment

.env configuration:

VLM_MODEL_TO_USE=custom
MODEL_PATH=/home/hoangnt/VSS-Agent/src/vss-engine/src/models/custom/demo/neva/
MODEL_ROOT_DIR=/home/hoangnt/VSS-Agent/src/vss-engine/src/models/

cURL command used:

curl -X POST http://localhost:8100/files \
  -H "Content-Type: multipart/form-data" \
  -F "purpose=vision" \
  -F "media_type=video" \
  -F "file=@example/accident.mp4"
curl -X POST http://localhost:8100/summarize \
  -H "Content-Type: application/json" \
  -d '{
    "id": "7106084b-6ec1-442d-8ff5-f3ce656e7411",
    "prompt": "You are an intelligent traffic system. You must monitor and note all events related to traffic accidents. Start and end each sentence with a timestamp.",
    "model": "neva",
    "api_type": "internal",
    ...
    "vlm_input_width": 256,
    "vlm_input_height": 256,
    ...
  }'

Problem

The assertion assert len(input) == 1 fails inside inference.py, which suggests that the input to the custom Neva model is either malformed or the input batch contains multiple items unexpectedly. I’m not sure whether this is an issue with the chunking logic, preprocessing pipeline, or the way VIA passes input tensors to the model.

I have tried that on my side. It works properly. Could you add the env below?

export NVIDIA_API_KEY=<your nvidia_api_key not the ngc_api_key>

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.