Inconsistent Temporal Annotations in VSS

Hardware Platform: 4× NVIDIA L40s GPUs
System Memory: 1 TB
Operating System: Ubuntu 22.04.5 LTS
NVIDIA GPU Driver Version: 550.120

Issue Type: Model Output Inaccuracy – Timestamp Misalignment

Issues Observed:
Timestamp Misalignment: The model successfully identifies advertisement brand names and segment types, but the associated timestamps are often inaccurate and do not align correctly with the actual video content. This results in hallucinated or imprecise temporal boundaries for classified segments.

Use Case:
We are using NVIDIA Video Semantic Services (VSS) to classify broadcast TV content into categories such as advertisements, promos, commercial breaks, and main programs. This classification task requires highly accurate segment timestamps.

Model Deployment Configuration:
Deployment Type: VLM (VILA 1.5) – Local
Reranker: llama-3.2-nv-rerankqa-1B-v2:1.3.0 – Local
Embedding: llama-3.2-nv-embedqa-1B-v2:1.3.0 – Local
ASR: riva-asr:1.3.0 – Local
LLM: LLaMA 3.1 70B – Remote

inception

What kind of timestamps are there in your video source and could you post a video clip here? And could you attach your prompts for your case?