Summarization API issues

Please provide the following information when creating a topic:

  • Hardware Platform (GPU model and numbers) : brev.dev launchable on CRUSOE
  • System Memory : 128GB ( unconfirmed )
  • Ubuntu Version : unknown
  • NVIDIA GPU Driver Version (valid for GPU only) : unknown
  • Issue Type : Question

While testing out the summerization api, it does not seem to follow completely the documentation provided. I believe some api params are not expected by the api.
The following is the params provided in the API documentation,


I believe it was failing because of the values in rag_type, rag_batch_size, etc.
This is what worked for us,
{
“id”: “5e223c04-c32d-4cfb-a6be-227ec5aa5f98”,
“prompt”: “Write a concise and clear dense caption for the provided video”,
“model”: “nvila”,
“stream”: true,

“stream_options”: {
“include_usage”: true
},
“max_tokens”: 512,
“temperature”: 0.4,
“top_p”: 1,
“top_k”: 100,
“seed”: 1,
“num_frames_per_chunk”: 10,
“vlm_input_width”: 0,
“vlm_input_height”: 0,
“chunk_duration”: 60,
“summary_duration”: 60,
“caption_summarization_prompt”: “Prompt for caption summarization”,
“summarize”: true,
“enable_chat”: false
}
Verified with the code from the via-engine container>

Requirement (vss-engine):

  1. After initiating the summarization for live stream via API, its response is a stream.
    Is there a way to just trigger the summarization for live stream without the response as a stream. We could use the chat completion API to query the vector database for the summaries.

  2. There is a limit of 256 live stream summarization currently possible. How can I increase this limit?

The possible reason is the model used. Now you can only use NVILA. There may be some problems when you are downloading the VILA model.

About the Requirement:

  1. Could you attach how you use this API in detail?
  2. How do you get the limit of 256 live stream summarization?
  1. We were using curl to add the streams and then summarize. The following is the code to summarize.

  2. We were able to successfully add 500 streams, but only 256 streams were able to be actively summarized by the earlier code. We looked into gradio UI to verify the active streams, and it only showed 256 active streams eventhough there were 500 streams going through the vss engine, which we confirmed with the tmp/assets folder (it had 500 uniques). The summarization performance started droping after 256 active streams too. But gradio ui kept showing only 256 active streams when we refresh active stream list. It failed to summarize for the rest of the 500 streams we added.

We later on, went through the vlm_pipeline.py file, and found this code, and it gave us the impression, there is a default limit set.

Is there any update on this?

About the 1st issue:

  • Trigger summarize using /summarize API.
  • You can terminate this connection, VIA backend will keep summarizing the live stream in the background
  • You can then do Q&A (/chat/completions)
  • One point: vector DB will keep updating periodically as stream continues

About the 2nd issue:

  • You can add following in env using overrides before deploying:
...
  env:
  - name: VSS_EXTRA_ARGS
    value: "--max-live-streams 512"