NVMM memory

Hi DaneLLL,

Thank you very much. Appreciate, your effort to code it up for me. It is working.


Hi DaneLLL,

As a follow up step, my next aim is to read and process frame in opencv. I modified your code to read frames. I am able to read frames. However the video appears to lag (is sluggish). I feel that I am copying lot of buffer around due to format conversion.

Could you please check ?

This certainly is not most optimal way to read frames from camera. Is there an optimal method that was discussed and shared (with code) on this forum ?


#include <cstdlib>
    #include <gst/gst.h>
    #include <gst/gstinfo.h>
    #include <gst/app/gstappsink.h>
    #include <glib-unix.h>
    #include <dlfcn.h>

    #include <iostream>
    #include <sstream>
    #include <thread>
    #include "opencv2/objdetect/objdetect.hpp"
    #include "opencv2/highgui/highgui.hpp"
    #include "opencv2/imgproc/imgproc.hpp"
    #include "opencv/cv.h"

    using namespace std;
    using namespace cv;
    cv::Mat frame;

    #define USE(x) ((void)(x))

    static GstPipeline *gst_pipeline = nullptr;
    static string launch_string;   

    static void appsink_eos(GstAppSink * appsink, gpointer user_data)
        printf("app sink receive eos\n");
    //    g_main_loop_quit (hpipe->loop);

    static GstFlowReturn new_buffer(GstAppSink *appsink, gpointer user_data)
        GstSample *sample = NULL;

        g_signal_emit_by_name (appsink, "pull-sample", &sample,NULL);

        if (sample)
            GstBuffer *buffer = NULL;
            GstCaps   *caps   = NULL;
            GstMapInfo map    = {0};

            caps = gst_sample_get_caps (sample);
            if (!caps)
                printf("could not get snapshot format\n");
            gst_caps_get_structure (caps, 0);
            buffer = gst_sample_get_buffer (sample);
            gst_buffer_map (buffer, &map, GST_MAP_READ);

            printf("map.size = %lu\n", map.size);

  //render using map_info.data
  frame = cv::Mat(1080, 1920, CV_8UC3, (char *)map.data, cv::Mat::AUTO_STEP);
 // memcpy(frame.data,map.data,map.size);

if (!frame.empty()) {

            gst_buffer_unmap(buffer, &map);

            gst_sample_unref (sample);
            g_print ("could not make snapshot\n");

        return GST_FLOW_OK;

    int main(int argc, char** argv) {

        gst_init (&argc, &argv);

        GMainLoop *main_loop;
        main_loop = g_main_loop_new (NULL, FALSE);
        ostringstream launch_stream;
        int w = 1920;
        int h = 1080;
        GstAppSinkCallbacks callbacks = {appsink_eos, NULL, new_buffer};

        << "nvcamerasrc ! "
        << "video/x-raw(memory:NVMM), width="<< w <<", height="<< h <<", framerate=30/1 ! " 
        << "nvvidconv ! "
        << "video/x-raw, format=I420, width="<< w <<", height="<< h <<" ! "
        << "videoconvert" << " ! " << "video/x-raw, format=(string)BGR " << " ! "
        << "appsink name=mysink ";

        launch_string = launch_stream.str();

        g_print("Using launch string: %s\n", launch_string.c_str());

        GError *error = nullptr;
        gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

        if (gst_pipeline == nullptr) {
            g_print( "Failed to parse launch: %s\n", error->message);
            return -1;
        if(error) g_error_free(error);

        GstElement *appsink_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysink");
        gst_app_sink_set_callbacks (GST_APP_SINK(appsink_), &callbacks, NULL, NULL);

        gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING); 

        //g_main_loop_run (main_loop);

        gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);

        g_print("going to exit \n");
        return 0;

We are probably not able to help more on this. It is 3rd party developed and the implementation uses CPU buffers. Extra memory is inevitable.

Please other opencv experts can share experience.

Hi DaneLLL,

Would it be easy for your to validate my perception of data pipe ?

I am thinking following …

Camera/ISP —(write operation)—> CameraBuffer (DDR , aka NVMM) ----(read operation by nvvidconv plugin, followed by write operation)----> CPU buffer (DDR, but this buffer is not NVMM) format is YUV 420 -----(read and write operation by videoconvert plugin )------> Yet another CPU buffer (DDR, this buffer is not NVMM) ----- (read operation by appsink) ----> HDMI display.

Does this flow appear right. If so this seems very unoptimal. Sure there must be some low hanging fruits here for optimization.

Thanks for your help, to lead me upto this point.

Would it be possible to read NVMM buffers directly into OpenCV (cv::Mat) as YUV420 buffers ? Can such thing be accomplished via OpenVX / NVX framework ?


We have nvgstcamera_capture in VisionWorks-1.6-Samples.
We also have 11_camera_object_identification in tegra_multimedia_api demonstrating Argus + openCV + caffe.

Please give it a try.

Hi DaneLLL,

Thanks for the suggestion. I tried it. It does seem to give me fps I want (4k @ 60). What would be a way to get frames from this example into cv::Mat ? Is there an example like that which I can follow ?


For nvgstcamera_capture, you have to use VisionWorks APIs.
For 11_camera_object_identification, it is in tegra_multimedia_api\samples\11_camera_object_identification\opencv_consumer_lib\opencv_consumer.cpp

Hi DaneLLL

Thanks again. My main aim is to read from two cameras concurrently (1080p @ 60 each camera) and do further processing, on video from each camera.

Is there a multi camera example ? What is best way of accomplishing this ? Do we need to think multi threaded code, such that we can schedule one thread per camera on one of A57s on Tx1 ? I am wondering how will we prevent the ‘imaginary queue’ between our camera and our opencv processing pipe from overflowing ? How to detect overflow and underflow conditions ? I am not too worried about underflow, as frame fetch will likely be event driven (some event signalling availability of new frame). Please advise.


We have two argus samples about multi camera sources:

But both do not integrate with opencv, so the performance is unknown. One bottleneck is that it seems opencv only supports BGR. We have HW engine NvVideoConverter can do I420->RGBA. However, it still needs cvtColor(in, out, CV_RGBA2BGR) to get BGR.

Have you tried to do processing via cuda directly?

Please refer to the attachment. It patches 11_camera_object_identification to profile cvtColor(in, out, CV_RGBA2BGR) which takes 2-3ms in 1080p30(with onboard ov5693). It looks acceptable for 1080p30, but not sure if it is good enough for two 1080p60. FYR.
0001-multimedia_api-profile-opencv.patch.txt (4.99 KB)
argus2opencv.txt (17.8 KB)

Thanks DaneLLL. I will try it out. Appreciate your help.

While I absorb your active help, and run your code, I would like to ask following -

  1. What is data path of pixels in 11_camera_object_identification code ? I would guess that ISP write to DDR, then that frame is read back for RGBA2BGR conversion and written out in BGR format. But I am not sure this is all. May be before ISP writes RGBA format to DDR, it would probably do Bayer to RGB converstion which will also involve at least 1 frame Write and 1 frame Rd, from DDR, I am curious to know how many trips to DDR does the frame make before I start my openCV processing ?

  2. If I want to take the approach of directly processing frames in CUDA (as you alluded to before) would it be more optimal (in terms of DDR frame traffic) than 11_camera_object_identification data path ? Does GPU reads frames directly from DDR or is there an internal on chip cache that it can use for purpose of reading input frames.

  3. About running 11_camera_object_identification code - I am unable to find following, on my Tx1, that you need to run that code (please help) -



ubuntu@tegra-ubuntu:~$ sudo find / -name bvlc_reference_caffenet.caffemodel
[sudo] password for ubuntu:
find: ‘/run/user/1000/gvfs’: Permission denied



  1. This is the optimal way to link openCV. Actually we encourage users to use VisioWorks and direct CUDA processing.

  2. For tegra multimedia api, please refer to tegra_multimedia_api\argus\samples\cudaHistogram
    For gstreamer, please refer to

openCV is excluded in this case.

  1. These files are not in use with the patch applied. You can run without these.

Hi DaneLLL,

  1. Between VisionWorks and CUDA processing which one is more optimal ? Could you please give more insights on data paths ? Is there a way to get YUV420 (NV12) format data from camera directly into CUDA ? Is there an CUDA example that I can follow ?

  2. Furthermore, another orthogonal question. I am interested in using the example 11_camera_object_identification for our objects/use case - could you please give (or point to) the model files so that I can run the original example ?



Please refer to tegra_multimedia_api\argus\samples\cudaHistogram

Please follow tegra_multimedia_api\samples\11_camera_object_identification\README