Hardware Platform: 4× NVIDIA L40s GPUs
System Memory: 1 TB
Operating System: Ubuntu 22.04.5 LTS
NVIDIA GPU Driver Version: 550.120
Issue Type: Model Output Inaccuracy – Timestamp Misalignment
Issues Observed:
Timestamp Misalignment: The model successfully identifies advertisement brand names and segment types, but the associated timestamps are often inaccurate and do not align correctly with the actual video content. This results in hallucinated or imprecise temporal boundaries for classified segments.
Use Case:
We are using NVIDIA Video Semantic Services (VSS) to classify broadcast TV content into categories such as advertisements, promos, commercial breaks, and main programs. This classification task requires highly accurate segment timestamps.
Model Deployment Configuration:
Deployment Type: VLM (VILA 1.5) – Local
Reranker: llama-3.2-nv-rerankqa-1B-v2:1.3.0 – Local
Embedding: llama-3.2-nv-embedqa-1B-v2:1.3.0 – Local
ASR: riva-asr:1.3.0 – Local
LLM: LLaMA 3.1 70B – Remote