Deepstream 4.0 : Storing NV12 frame buffers to a file from PGIE sink pad callback

Environment: GeForce RTX 2060 with Driver Version: 430.26 and CUDA Version: 10.2

I am looking to save the individual (decoded) NV12 frames into a separate data store and So I tried to extend the code into the deepstream_test3_app.c as explained below

** Registering callback for every frame received over PGIE sink (same as streammux src pad) **

streammux_src_pad = gst_element_get_static_pad (pgie, "sink");
  if (!streammux_src_pad)
    g_print ("Unable to get streammux sink pad\n");
  else
    gst_pad_add_probe (streammux_src_pad, GST_PAD_PROBE_TYPE_BUFFER,
        streammux_src_pad_buffer_probe, NULL, NULL);

** Inside streammux_src_pad_buffer_probe( ) called for every frame received **

static GstPadProbeReturn
streammux_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
...
GstMapInfo in_map_info;
NvBufSurface *surface = NULL;
GstBuffer *inbuf = gst_pad_probe_info_get_buffer(info);

memset (&in_map_info, 0, sizeof (in_map_info));
  if (!gst_buffer_map (inbuf, &in_map_info, GST_MAP_READ)) {
    g_print ("Error: Failed to map gst buffer\n");
    goto error;
  }

surface = (NvBufSurface *) in_map_info.data;
// Writing surface buffer into file
fwrite(surface->surfaceList[0].dataPtr, surface->surfaceList[0].dataSize, 1, fp1);
...

}

Then I try to input the dumped frames/files into a mjpg video using ffmpeg

ffmpeg -s:v 1920x1080 -r 20 -pix_fmt nv12 -i img_%d.raw out.mjpg

Result is, when I play out.mjpg with VLC, just a few frames are played and stopped, but its not a full video as the source video. Though I am through partially, I am assuming I may have missed to run a loop/indexing somewhere required - which is not very clear from the examples and the documentation referred.

Ref : /root/deepstream_sdk_v4.0_x86_64/sources/gst-plugins/gst-dsexample/gstdsexample.cpp
Ref : NVIDIA DeepStream SDK API Reference: NvBufSurface Struct Reference

Additional reference : /deepstream_sdk_v4.0_x86_64/sources/gst-plugins/gst-nvinfer/gstnvinfer_allocator.cpp

/* Calculate pointers to individual frame memories in the batch memory and
     * insert in the vector. */
    tmem->frame_memory_ptrs[i] = (char *) tmem->surf->surfaceList[i].dataPtr;

So can you please let me know what am I missing?

Requirement being - I am NOT supposed to do any transform of the NV12 buffer (reaching as input to PGIE) but just faithfully store/pass those decoded frames for another application usage.

you need to get the batch meta first and travel the batch_meta->frame_meta_list, then get the surfacelist for every frame, could you refer below code for the details?

static GstPadProbeReturn
tiler_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
#ifdef DUMP_JPG
    GstBuffer *buf = (GstBuffer *) info->data;
    NvDsMetaList * l_frame = NULL;
    NvDsMetaList * l_user_meta = NULL;
    NvDsUserMeta *user_meta = NULL;
    NvDsInferSegmentationMeta* seg_meta_data = NULL;
    // Get original raw data
    GstMapInfo in_map_info;
    char* src_data = NULL;
    if (!gst_buffer_map (buf, &in_map_info, GST_MAP_READ)) {
        g_print ("Error: Failed to map gst buffer\n");
        gst_buffer_unmap (buf, &in_map_info);
        return GST_PAD_PROBE_OK;
    }
    NvBufSurface *surface = (NvBufSurface *)in_map_info.data;

    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
      l_frame = l_frame->next) {
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
        /* Validate user meta */
        for (l_user_meta = frame_meta->frame_user_meta_list; l_user_meta != NULL;
            l_user_meta = l_user_meta->next) {
            user_meta = (NvDsUserMeta *) (l_user_meta->data);
            if (user_meta && user_meta->base_meta.meta_type == NVDSINFER_SEGMENTATION_META) {
                seg_meta_data = (NvDsInferSegmentationMeta*)user_meta->user_meta_data;
            }
        }

        src_data = (char*) malloc(surface->surfaceList[frame_meta->batch_id].dataSize);
        if(src_data == NULL) {
            g_print("Error: failed to malloc src_data \n");
            continue;
        }
        cudaMemcpy((void*)src_data,
                   (void*)surface->surfaceList[frame_meta->batch_id].dataPtr,
                   surface->surfaceList[frame_meta->batch_id].dataSize,
                   cudaMemcpyDeviceToHost);
        dump_jpg(src_data,
                 surface->surfaceList[frame_meta->batch_id].width,
                 surface->surfaceList[frame_meta->batch_id].height,
                 seg_meta_data, frame_meta->source_id, frame_meta->frame_num);

        if(src_data != NULL) {
            free(src_data);
            src_data = NULL;
        }
    }
    gst_buffer_unmap (buf, &in_map_info);
#endif
    return GST_PAD_PROBE_OK;

Thanks for your response.

While I check the code you suggested, I have a quick question

I confirmed from my below debug code that the memory type = 3 (NVBUF_MEM_CUDA_UNIFIED) for the probe data I received on PGIE sink

NvBufSurface gpuId=0, batchSize=1, numFilled=1, isContiguous=0, memtype=3,
 surfaceList width=1920, height=1080 colorFormat=6 layout=0 dataSize=3110400 dataPtr=0x7fa2f2c00000

So I assumed cudaMemcpyDeviceToHost was NOT required (or redundant) as such memory is accessible from CPU/GPU, but your code suggests I need to do a cudaMemcpy - which was the reason I kind of directly was able to use that pointer into my fwrite

fwrite(surface->surfaceList[0].dataPtr, surface->surfaceList[0].dataSize, 1, fp1);

I am bit confused from SDK docs and your response.

Yes, if the memory type is NVBUF_MEM_CUDA_UNIFIED, you can access the memory directly from cpu side, I just show you how to get the raw data corresponding to the frame.

Hi,

I tried the code you suggested by traversing batch_meta->frame_meta_list as shown below, but I still do not see any change in the behavior (probably because the batch-size=1 configured in dstest3_pgie_config.txt ? )

GstMapInfo in_map_info;
  NvBufSurface *surface         = NULL;
  NvDsMetaList *l_frame         = NULL;
  GstBuffer *inbuf              = gst_pad_probe_info_get_buffer(info);
  NvDsBatchMeta *batch_meta     = gst_buffer_get_nvds_batch_meta (inbuf);

  memset (&in_map_info, 0, sizeof (in_map_info));
  if (!gst_buffer_map (inbuf, &in_map_info, GST_MAP_READ)) {
    g_print ("Error: Failed to map gst buffer\n");
    goto error;
  }

  surface = (NvBufSurface *) in_map_info.data;

  for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next){
    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

  // build the file name
  snprintf(file_name, 256, "img_%d.raw", frame_meta->frame_num);

  FILE *fp1 = NULL;
  fp1 = fopen(file_name, "w");

#if 0 // Seems redundant but added to check NVIDIA support response, no behavior change with this too
  char* src_data = NULL;
  src_data = (char*) malloc(surface->surfaceList[frame_meta->batch_id].dataSize);
  if (src_data != NULL) {
  cudaMemcpy((void*)src_data,
                   (void*)surface->surfaceList[frame_meta->batch_id].dataPtr,
                   surface->surfaceList[frame_meta->batch_id].dataSize,
                   cudaMemcpyDeviceToHost);
  fwrite(src_data, surface->surfaceList[frame_meta->batch_id].dataSize, 1, fp1);
  fclose(fp1); 
  free(src_data);
  }
#endif // Seems redundant but added to check NVIDIA support response, no behavior change with this too

  fwrite(surface->surfaceList[frame_meta->batch_id].dataPtr,
                  surface->surfaceList[frame_meta->batch_id].dataSize, 1, fp1);
  fclose(fp1);
 }

The example you suggested kind of refers to https://devtalk.nvidia.com/default/topic/1060782/b/t/post/5372713/ which seems to convert from NV12 to BGRA, but for my requirement I am just trying to store NV12 as is and let the upstream application decide if it wants to use it as is or want to do any transforms.

Do let me know if I am missing anything here.

How many sources did you add?

Just one source.

I am executing the deep stream test 3 app after my modifications like
./deepstream-test3-app file:///root/deepstream_sdk_v4.0_x86_64/samples/streams/sample_720p.h264

It should work if just one source, pls check if there is frame missing?

That sample program with the sample video - without my changes runs on 1441 frames and with my changes I am creating/writing/dumping the same number of files, So I believe frames are not missed.

Is there any other debug data that I can print to confirm what am I missing?

Maybe you can try as https://devtalk.nvidia.com/default/topic/1060782/b/t/post/5372713/ to dump this to the jgp directly to check where is wrong

I tried the command below and I can confirm the file stored is a valid file by opening with GIMP

gst-launch-1.0 filesrc blocksize=3110400 location=/root/image_dump/stream_0_img_1440.raw ! 
'video/x-raw,format=(string)NV12,width=(int)1920,height=(int)1080,framerate=(fraction)0/1' ! 
jpegenc ! 'image/jpeg, width=(int)1920,height=(int)1080,framerate=(fraction)0/1' ! 
filesink location=/root/image_dump/img_1440_from_h264.jpg

So may be the process of creating mjpeg video from NV12’s directly with ffmpeg is not right?
Is there a gst-launch command that you can suggest to ensure I get a mjpg valid video?
Because individual frames when converted to jpeg seems OK but the video made is playing just few frames.

Not quite sure, but I think you can write a simple python using opencv to do this.

I am able to play that video generated with “ffmpeg -s:v 1920x1080 -pix_fmt nv12 -i img_%d.raw out.mjpg” with ffplay but not with VLC - which either plays extremely jerky or just plays few frames and stops!

At-least I can confirm that my understanding of NvBufSurface interface to store the NV12 frame can be validated.

Thanks for your support to highlight that I need to read the frame_meta list to go through all the frames (serialized in frame buffers from multiple sources by streammux).