Issue Type: Model misbehaving/not working properly
We are using the Video Search and Summarization (VSS) model for our use case, where we need to detect the start and end timestamps of advertisements in TV channel videos. While the model successfully identifies advertisement brand names, the detected timestamps are inaccurate and often hallucinate incorrect values.
We deployed the system using the provided model and configuration, but the timestamp detection does not function as expected for our specific use case. Additionally, we have tried using different prompts to improve the performance, but this has not resolved the issue.
Can you please provide some insights or guidance on how to address this issue?
Could you attach your video source and the prompts you are using? You can also attach the reasoning timestamps and the real timestamps of the source. We can try that on our side and analyze it.
Also, what’s the size of the memory of your A100 and could you attach the config file you are using to deploy VSS?
@yuweiw
To provide more clarity on the issue, I’ve uploaded a sample video and a supporting document for reference. The document includes:
The prompt used for the VSS model.
The ground truth timestamps for the advertisements in the video.
The model results for the same video when segmented into 5-second chunks, 20-second chunks, and 5-minute chunks.
This analysis covers a 5-minute sample video and ground truth of Ads timestamps.
@yuweiw
I wanted to check in on the status of the issue with the VSS model. Have you had a chance to review the sample video and documents I shared? Let me know if you need any additional info.
Thank you
Hi @aamanmehraa89 , we may not be able to further optimize the accuracy of the timesatamp in the current version. In the subsequent versions, we will gradually optimize the model to improve the accuracy. Thanks