I’m testing the vila-1.5
(https://huggingface.co/Efficient-Large-Model/VILA1.5-7b) model with the vss-engine:2.3.0
container and encountering an issue where the response is always:
{
"id": "7f9ba7e5-a15b-4d12-bfc6-dda0c30ee130",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Sorry, I don't see that in the video.",
"tool_calls": [],
"role": "assistant"
}
}
],
"created": 0,
"model": "vila-1.5",
"media_info": {
"type": "offset",
"start_offset": 0,
"end_offset": 4000000000
},
"object": "summarization.completion",
"usage": {
"query_processing_time": 0,
"total_chunks_processed": 0
}
}
Here is the request I used:
curl -X POST http://localhost:8100/chat/completions \
-H "Content-Type: application/json" \
-d '{
"id": "9ff99617-42b1-4738-99bf-f9c33a1e3bed",
"messages": [
{
"content": "How many people in the video are not wearing PPE helmets?",
"role": "user",
"name": "hoangnt66"
}
],
"model": "vila-1.5",
"api_type": "internal",
"response_format": { "type": "text" },
"stream": true,
"stream_options": { "include_usage": false },
"max_tokens": 512,
"temperature": 0.2,
"top_p": 1,
"top_k": 100,
"seed": 10,
"chunk_duration": 60,
"chunk_overlap_duration": 10,
"summary_duration": 60,
"media_info": {
"type": "offset",
"start_offset": 0,
"end_offset": 4000000000
},
"highlight": false,
"user": "hoangnt66"
}'
It seems the model fails to process any video chunks (total_chunks_processed: 0
) even though the media_info
covers the entire video duration. The response is always a fallback message: "Sorry, I don't see that in the video."