There is an issue with the matching between the alarm generation time and the content

Please provide the following information when creating a topic:

  • Hardware Platform (GPU model and numbers)

8 * H20

  • System Memory

1TB

  • Ubuntu Version

rhel9.4

  • NVIDIA GPU Driver Version (valid for GPU only)

gpu operator→25.3.0, driver→570.124.06

)

Here is the bugs:

  1. The alarm occurrence time provided in the alert is inconsistent with the actual time in the video.
  2. The same alert prompt generated an alarm, but no matching scene was found when using it to clip the video.

Here is the summary prompt: Identify the behavior of vehicles changing from one lane to another in the traffic video, driving over white or yellow solid lines during the process. Focus on the lane-changing process, including the periods before, during, and after the lane change. Please provide the vehicle’s characteristic information (including the vehicle’s color, license plate number, and other features) and the time when the event occurred.

Here is the alert prompt: A vehicle changes from one lane to another, driving over a white or yellow solid line during the process.

When creating a topic, there is a limitation that only one image can be provided. Below are the additional images for reference.

Which VLM model are you using? Could you try the nvila-15b-hires VLM?

        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: nvila
          - name: MODEL_PATH
            value: ngc:nvidia/tao/nvila-highres:nvila-lite-15b-highres-lita

And are you using llama-70b LLM?

Yes, the recommended version is being used.

I have tried your video on my side. Basically, the correct event can be reported. If you want more accurate time period, you might need to finetune the model using a large amount of data related to this event.

Prompt: Identify the behavior of illegal lane change. Please report the time period when it occurred, the color of the vehicle, and the license plate information.

Caption Summary Prompt: You will be given captions from sequential clips of a video. Aggregate captions in the format start_time:end_time:caption based on whether captions are related to one another or create a continuous scene.

Summary Aggregation Prompt: Based on the available information, generate a traffic report that is organized chronologically and in logical sections. This should be a concise, yet descriptive summary of all the important events. The format should be intuitive and easy for a user to read and understand what happened. Format the output in Markdown so it can be displayed nicely.

**There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks**

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.