Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU • DeepStream Version 7.2 • JetPack Version (valid for Jetson only) • TensorRT Version 8.6 • NVIDIA GPU Driver Version (valid for GPU only) • Issue Type( questions, new requirements, bugs) questions • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) smart record/nvmsgbroker
What is the best way to ensure metadata and video are aligned frame-accurately when:
Metadata is sent out via ZMQ,
SmartRecord is triggered later via a separate ZMQ command?
How can I guarantee the correct PTS mapping between the metadata and the recorded MP4, especially when the metadata timestamps are received before the SmartRecord is triggered?
Would embedding metadata (e.g., via nvdsmeta) directly into the MP4 help? Or should the metadata be stored externally as a sidecar file (e.g., Avro/JSON), indexed by frame-level PTS?
Below is my pipeline and consuming two rtsp source (25fps)
Are you planning to store metadata in the mp4 container?
You can convert metadata to h264/5 sei, the store SEI to mp4 container. But this requires you to parse the mp4 file and extract the SEI.
Can you share your goals for doing so?
Refer to this topic
If your application is written by native code. Please refer to /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdsmetautils/sei_serialization .
I think this is feasible. Each json file has the PTS and source id of the corresponding frame.
The NvDsFrameMeta is sent out via a custom ZMQ nvmsgbroker. A ZMQ dealer (or client) consumes the metadata, analyzes the bounding boxes and class objects, and determines when to start and stop recording the MP4. A message is then sent back via the ZMQ dealer in the form of a bus_message containing the start and stop commands. Smart Record will start or stop recording the camera stream based on these commands.
The ZMQ dealer also begins recording the metadata — storing bounding boxes, frame counts, class names, etc. — in a serialized format such as NDJSON. I refer the metadata to the bounding boxes and class labels per frame.
The issue arises when trying to combine the metadata file with the MP4 to generate a final video with bounding boxes. The boxes become misaligned when using frame counts. I’ve tried creating a rolling PTS buffer before Smart Record to align frames, but the bounding boxes are still misaligned.
I also need a cache to dump frames from approximately 5 seconds before the recording starts. While all components are working, the main problem is the misalignment between the frame-by-frame metadata and the recorded video.
I am not sure how sei_serialization would help on this.
This may be related to the implementation of SmartRecord.
When SmartRecord is triggered. If the first frame of the recorded file is not an IDR frame, the recorded file will discard the undecodable frames when it is played back until an IDR frame is encountered.
When merging the metadata file with MP4, you need to record the PTS of the metadata and merge it when the video frame matches the metadata PTS.
If metadata is converted into SEI, then these SEIs will have the same PTS as the video frame. If you don’t know how to use SEI, the previous solution can also achieve a similar goal
Would I need to keep a rolling cache of the PTS and probe on the recordbin_<camera_index>_queue, and once a start command is issued, dump the PTS cache and begin writing PTS values to a file until Smart Record is stopped?
static GstPadProbeReturn rolling_buffer_probe(
GstPad *pad, GstPadProbeInfo *info, gpointer user_data)
{
auto *probeCtx = static_cast<PadProbeCtx*>(user_data);
auto *ctx = probeCtx->ctx;
guint camIdx = probeCtx->camIdx;
if (auto *buf = GST_PAD_PROBE_INFO_BUFFER(info)) {
GstClockTime src_pts = GST_BUFFER_DTS(buf);
GstClockTime batch_pts = GST_BUFFER_PTS(buf);
// store in cache ..
}
}
And create the probe per recordbin_queue.
for (guint i = 0; i < numCams; ++i) {
std::string queueName = "recordbin" + std::to_string(i) + "-queue";
GstElement *queueElem = pipe.getElement(queueName);
GstPad *sinkpad = gst_element_get_static_pad(queueElem, "sink");
auto *probeCtx = new PadProbeCtx{&ctx, i};
ctx.probeIds[i] = gst_pad_add_probe(
sinkpad,
GST_PAD_PROBE_TYPE_BUFFER,
rolling_buffer_probe,
probeCtx,
+[](gpointer data) { delete static_cast<PadProbeCtx*>(data); }
);
gst_object_unref(sinkpad);
}
For the SEI, wouldn’t it be better to store a file than parsing mp4 file to write into the container as my events created have to be upload to the cloud?
I don’t know your code implementation, but from your pipeline, I think you can achieve your goal by getting the source id and pts of the video frame downstream of nvstreamdemux.
Start storing metadata before triggering NvDsSRStart, and stop storing after NvDsSRStop(manually stop smartrecord)/smart_record_callback(automatically stops when duration is reached).
This is just an option. Whether to use it depends on your production environment
I have probed the record-queue to inspect the rolling buffer, and when recording starts, I probe on the record-conv element. Both probes are placed downstream of nvstreamdemux.
Additionally, I switched the container format from MP4 to MKV, as MKV appears to better preserve PTS values compared to MP4.
When matching the PTS from the probe dumps with the PTS in the MKV file, the synchronization now appears to be accurate.
I will test with additional streams (more than two) to verify if synchronization is consistently maintained.