DeepStream 6.1 new streammux CUDA unified vs. default problem


I am starting to migrate from DeepStream 6.0.1 to 6.1.
new nvstreammux doesn’t do any scaling anymore.
My pipeline is like this:

rtspsrc →
decodebin →
nvvideoconvert (nvbuf-memory-type=nvbuf-mem-cuda-unified, NV12->RGBA, NVMM, 1280x720) →
nvstreammux →
nvinfer (detector) →
nvinfer (classifier) -

My problem is that with 6.0.1, I got a CUDA unified buffer, because I set it in nvvideoconvert and nvstreammux.
With 6.1 nvstreammux does not have nvbuf-memory-type property anymore.
So I thought that setting nvbuf-memory-type=nvbuf-mem-cuda-unified in nvvideoconvert was enough to receive it as CUDA unified in appsink.

I always get the buffer as NVBUF_MEM_DEFAULT (that is NVBUF_MEM_CUDA_DEVICE).
So I cannot use NvBufSurfaceMap in appsink. (NvBufSurfaceMap according to the documentation only works with NVBUF_MEM_CUDA_UNIFIED for dGPU).

Is it normal to receive DEFAULT, when I set it to CUDA unified in the nvvideoconvert?
Normally new nvstreammux doesn’t touch the frame, is that correct?

I guess I have 2 solutions, use another nvvideoconvert after nvstreammux (don’t know if it’s possible), or use the NvBufSurface API to convert the buffer from DEFAULT to CUDA unified in the appsink.
Am I correct?

Thanks for the help!

nvbuf-memory-type is not supported by new nvstreammux. Gst-nvstreammux New — DeepStream 6.1.1 Release documentation

Why do you need the buffer?

You can remove the nvvideoconvert after decodebin and before the nvstreammux because nvstreammux can mux NV12 videos.

You can add nvvideoconvert after nvinfer.

I need the buffer to be RGBA, that’s why I put it before nvstreammux, the nvinfer models need the frame to be RGBA.
I want the buffer to do some stuff in appsink, that’s why I want to use CUDA unified, so I can get it in CPU memory easily.
What I don’t understand is why I set it to CUDA unified in nvvideoconvert before nvstreammux, and in appsink I receive a DEFAULT one. At some point the buffer type is changed, and that’s what I’d like to understand.

One question aside from the doubt I have, if I put nvvideoconvert just after nvstreammux, and before the first nvinfer, if I want to convert from NV12 to RGBA, will it convert all the frames in the batch?

About my last question, just saw the documentation:

  • Inputs
    • Gst Buffer batched buffer
    • NvDsBatchMeta
    • Format: NV12, I420, P010_10LE, BGRx, RGBA (NVMM/RAW)

Do you mean your model input layer needs RGBA but not RGB?
nvinfer supports NV12 input, if the model needs RGB/BGR input, just set “model-color-format” parameter in the nvinfer configuration file to correct value is enough. It has nothing to do with the nvinfer input format. Gst-nvinfer — DeepStream 6.1.1 Release documentation

You must make sure nvstreammux’s “width” and “height” properties are the same as the video resolution. Gst-nvstreammux New — DeepStream 6.1.1 Release documentation
Sometimes when there are multiple input streams with different resolutions, the nvstreammux must scale all streams to the same resolution and make the batch, so it is a new buffer but not the one from video decoder.

You don’t have to convert the format to RGBA before nvinfer. If you do so, it will convert all frames in the batch.

Model input is RGB yes.
I cannot use width and height, in the new streammux they are absent, and theoretically new streammux does not scale.

Do you know why the buffer comes as NVBUF_MEM_DEFAULT in my appsink, when I set it to nvbuf-mem-cuda-unified in nvvideoconvert before streammux. Does new streammux modify the buffer?

For testing I created a nvvideoconvert after streammux like this:

And I still get NVBUF_MEM_DEFAULT and so NvBufSurfaceMap() is failing.

The same happens if I put the nvvideoconvert with CUDA unified just before the appsink:

Do you have any idea of how I can have a CUDA unified buffer in the appsink?

Just tried with old streammux, I get correctly a CUDA Unified buffer of all the frames in the batch:

I think I’ve give the solution in my previous post, you need to remove the nvvideoconvert after decodebin and move it after nvstreammux.

For the nvvideoconvert, the input resolution and format are just the same as the output resolution and format, the buffer(data) will pass by the plugin.

I tried to move it after streammux.
I also tried to move it just before appsink.
And in appsink I cannot receive the buffer as CUDA Unified. I always receive it as NVBUF_MEM_DEFAULT, that is CUDA device in my case as I’m using dGPU.

Can you post the code?

I am preparing a simple program then I’ll post you the code.

From the graph you post in DeepStream 6.1 new streammux CUDA unified vs. default problem - #10 by tvai, the nvvideoconvert sink pad and src pad have the same caps, that means you do not follow my suggestion. Please remove the nvvideoconvert after decodebin and add nvvideoconvert after classifier nvinfer and put capsfilter with RGBA format after nvvideoconvert. Then the sink pad of nvvideoconvert will show NV12 format caps and src pad of nvvideoconvert will show RGBA format caps.

It was just a test, to see if nvbuf-memory-type was working, I don’t have problem to convert NV12 to RGBA.
My problem is that I don’t receive a CUDA Unified buffer in my appsink, I always receive a NVBUF_MEM_DEFAULT one.

My current pipeline is like this:

rtspsrc →
decodebin →
nvvideoconvert (nvbuf-memory-type=nvbuf-mem-cuda-unified, NV12->RGBA, NVMM, 1280x720) →
nvstreammux →
nvinfer (detector) →
nvinfer (classifier) -

And even if I change from NV12 to RGBA, the buffer type is not changed to CUDA Unified like I specify in nvbuf-memory-type.

So if nvvideoconvert doesn’t change the buffer to CUDA Unified, do you know how I can retrieve the CUDA Device buffer in CPU RAM?
I don’t see an API from NvBufSurface to do this kind of conversion, from CUDA DEVICE to RAM buffer (of all the batch). Maybe I missed something.

The thing is that with old streammux, I received correctly a CUDA Unified buffer on which I could use NvBufSurfaceMap() function in appsink (under some conditions, not for all the frames), and then process the buffer in CPU memory. I’d like to do the same with using new streammux.

When I have time I’ll send you a piece of code reproducing the issue I have.

Hi Fiona,

Thanks for your time.
First of all, I apologize, I forgot to add the capsfilter after nvvideoconvert, in this case it’s working fine, only if the nvvideoconvert is after nvstreammux.
In this case I can receive CUDA Unified buffer in appsink correctly.

But I wonder why it doesn’t work if nvvideoconvert is before new nvstreammux?

Look at this code and the resulting pipeline:


CUDA_VER ?= 11.6

APP := simple-pipeline-test


LIB_INSTALL_DIR ?= /opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/lib/

SRCS := $(wildcard *.cpp)

PKGS := gstreamer-1.0 gstreamer-app-1.0

OBJS := $(SRCS:.cpp=.o)

CXXFLAGS += -I../../../includes \
		-I /usr/local/cuda-$(CUDA_VER)/include \
		-I /opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/sources/includes

CXXFLAGS += $(shell pkg-config --cflags $(PKGS)) --std=c++17 -DDEBUG -g -gdwarf-3

LIBS := $(shell pkg-config --libs $(PKGS)) \
		-L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart \
		-L$(LIB_INSTALL_DIR) -lnvdsgst_meta -lnvds_meta -lnvbufsurface \
		-lcuda -lstdc++fs -Wl,-rpath,$(LIB_INSTALL_DIR)

all: $(APP)

%.o: %.cpp Makefile
	$(CXX) -c -o $@ $(CXXFLAGS) $<

$(APP): $(OBJS) Makefile
	$(CXX) -o $(APP) $(OBJS) $(LIBS)

	rm -rf $(OBJS) $(APP)


#include <nvbufsurface.h>
#include <gstnvdsmeta.h>
#include <gst/gst.h>
#include <gst/app/gstappsink.h>
#include <gstreamer-1.0/gst/rtsp/gstrtsptransport.h>
#include <filesystem>
#include <fstream>
#include <iostream>
#include <string_view>
#include <cassert>
#include <cstdlib>
#include <cstring>

GstElement* g_pipeline = nullptr;

void write_nvstreammux_config(const std::filesystem::path& path)
  std::ofstream f(path);
  f << "[property]" << std::endl;
  f << "algorithm-type=1" << std::endl;
  f << "overall-max-fps-n=5" << std::endl;
  f << "overall-max-fps-d=1" << std::endl;
  f << "overall-min-fps-n=5" << std::endl;
  f << "overall-min-fps-d=1" << std::endl;
  f << "max-same-source-frames=1" << std::endl;
  f << "adaptive-batching=1" << std::endl;
  f << "max-fps-control=0" << std::endl;

void write_nvinfer_config(const std::filesystem::path& path,
                          const std::filesystem::path& model_dir)
  std::ofstream f(path);
  f << "[property]" << std::endl;
  f << "gpu-id=0" << std::endl;
  f << "net-scale-factor=0.0039215697906911373" << std::endl;
  f << std::endl;
  f << "input-dims=3;544;960;0" << std::endl;
  f << "uff-input-blob-name=input_1" << std::endl;
  f << "batch-size=16" << std::endl;
  f << "model-color-format=0" << std::endl;
  f << "num-detected-classes=3" << std::endl;
  f << "cluster-mode=1" << std::endl;
  f << "interval=0" << std::endl;
  f << "gie-unique-id=1" << std::endl;
  f << "output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid" << std::endl;
  f << std::endl;
  f << "tlt-model-key=tlt_encode" << std::endl;
  f << "tlt-encoded-model=" << (model_dir / "resnet34_peoplenet_pruned.etlt").c_str() << std::endl;
  f << "model-engine-file=" << (model_dir / "resnet34_peoplenet_pruned.etlt_b16_gpu0_fp16.engine").c_str() << std::endl;
  f << "labelfile-path=" << (model_dir / "labels.txt").c_str() << std::endl;
  f << std::endl;
  f << "network-mode=2" << std::endl;
  f << "network-type=0" << std::endl;
  f << "process-mode=1" << std::endl;
  f << std::endl;
  f << "[class-attrs-all]" << std::endl;
  f << "pre-cluster-threshold=0.4" << std::endl;
  f << "eps=0.7" << std::endl;
  f << "minBoxes=1" << std::endl;
  f << "nms-iou-threshold=0.6" << std::endl;

gboolean print_cap(GstCapsFeatures* /*features*/, GstStructure* structure, gpointer /*user_data*/)
    gchar* s = gst_structure_to_string(structure);
    std::cout << "caps: " << s << std::endl;
    return TRUE;

void on_decodebin_pad_added(GstElement* /*element*/, GstPad* pad, gpointer data)
  std::cout << "[on_decodebin_pad_added]" << std::endl;
  GstElement* videoconvert = reinterpret_cast<GstElement*>(data);
  GstCaps* caps = gst_pad_get_current_caps(pad);
  gst_caps_foreach(caps, print_cap, nullptr);
  GstPad* sink_pad(gst_element_get_static_pad(videoconvert, "sink"));
  gst_pad_link(pad, sink_pad);

void print_batch_meta(NvDsBatchMeta* batch_meta, bool print_detections)
  for (NvDsFrameMetaList* frame_meta_list = batch_meta->frame_meta_list;
       frame_meta_list != nullptr;
       frame_meta_list = frame_meta_list->next)
    NvDsFrameMeta* frame_meta = static_cast<NvDsFrameMeta*>(frame_meta_list->data);
    if (frame_meta != nullptr)
          << "[on_new_sample] streammux_source_id="
          << frame_meta->source_id
          << ", frame_num="
          << frame_meta->frame_num
          << std::endl;

      if (print_detections)
        for (NvDsObjectMetaList* obj_meta_list = frame_meta->obj_meta_list;
             obj_meta_list != nullptr;
             obj_meta_list = obj_meta_list->next)
          NvDsObjectMeta* nv_ds_object_meta = static_cast<NvDsObjectMeta*>(obj_meta_list->data);
          if (nv_ds_object_meta == nullptr)

              << "[on_new_sample] "
              << nv_ds_object_meta->unique_component_id
              << ", "
              << nv_ds_object_meta->class_id
              << ", "
              << nv_ds_object_meta->confidence
              << ", "
              << nv_ds_object_meta->obj_label
              << std::endl;

GstFlowReturn on_new_sample(GstElement* element, gpointer /*user_data*/)
  GstSample* sample = gst_app_sink_pull_sample(GST_APP_SINK_CAST(element));
  assert(sample != nullptr);

  GstBuffer* buffer = gst_sample_get_buffer(sample);
  assert(buffer != nullptr);

  GstMapInfo map_info;
  std::memset(&map_info, 0, sizeof(map_info));
  if (gst_buffer_map(buffer, &map_info, GST_MAP_READ))
    NvBufSurface* surface = reinterpret_cast<NvBufSurface*>(;
    if (surface->memType == NVBUF_MEM_CUDA_UNIFIED)
      std::cout << "MEMORY TYPE: NVBUF_MEM_CUDA_UNIFIED" << std::endl;
      if (NvBufSurfaceMap(surface, -1, -1, NVBUF_MAP_READ) == 0)
        if (NvBufSurfaceUnMap(surface, -1, 0) == 0)
          std::cout << "NvBufSurfaceMap/NvBufSurfaceUnMap successful" << std::endl;
    else if (surface->memType == NVBUF_MEM_DEFAULT)
      std::cout << "MEMORY TYPE: NVBUF_MEM_DEFAULT" << std::endl;
      std::cout << "MEMORY TYPE: " << surface->memType << std::endl;

    gst_buffer_unmap(buffer, &map_info);

  NvDsBatchMeta* batch_meta = gst_buffer_get_nvds_batch_meta(buffer);
  if (batch_meta != nullptr && batch_meta->num_frames_in_batch > 0)
    print_batch_meta(batch_meta, !false);


  return GST_FLOW_OK;

gboolean bus_call(GstBus* /*bus*/, GstMessage* msg, gpointer data)
  using namespace std::string_view_literals;

  GMainLoop* loop = reinterpret_cast<GMainLoop*>(data);
  switch (GST_MESSAGE_TYPE(msg))
      std::cout << "End of stream" << std::endl;
      gchar* debug;
      GError* error;
      gst_message_parse_error(msg, &error, &debug);
          << "Error from element "
          << GST_OBJECT_NAME(msg->src)
          << ": "
          << error->message
          << std::endl;
      if (debug)
        std::cerr << "Error details " << debug << std::endl;
      GstState old_state;
      GstState new_state;
      gst_message_parse_state_changed(msg, &old_state, &new_state, nullptr);
      const char* name = GST_OBJECT_NAME(msg->src);
      if (name == "pipeline"sv && new_state == GST_STATE_PLAYING)
        gst_debug_bin_to_dot_file(GST_BIN(g_pipeline), GST_DEBUG_GRAPH_SHOW_ALL, "pipeline");
  return TRUE;

int main(int argc, char* argv[])
  using namespace std::string_view_literals;

  if (argc != 3)
    std::cerr << "Usage: simple-pipeline-test MODEL_ABSOLUTE_DIR FILE_PATH" << std::endl;
    return 1;

  write_nvinfer_config("/tmp/nvinfer-config.txt", argv[1]);

  gst_init(&argc, &argv);

  GstElement* src = gst_element_factory_make("filesrc", "src");
  GstElement* decodebin  = gst_element_factory_make("decodebin", "decodebin");
  GstElement* videorate = gst_element_factory_make("videorate", "videorate");
  GstElement* videoconvert = gst_element_factory_make("nvvideoconvert", "videoconvert");
  GstElement* capsfilter = gst_element_factory_make("capsfilter", "capsfilter");
  GstElement* streammux = gst_element_factory_make("nvstreammux", "streammux");
  GstElement* infer = gst_element_factory_make("nvinfer", "infer");
  GstElement* sink = gst_element_factory_make("appsink", "sink");
  GstElement* pipeline = gst_pipeline_new("pipeline");

  assert(src != nullptr);
  assert(decodebin != nullptr);
  assert(videorate != nullptr);
  assert(streammux != nullptr);
  assert(videoconvert != nullptr);
  assert(capsfilter != nullptr);
  assert(infer != nullptr);
  assert(sink != nullptr);
  assert(pipeline != nullptr);

  g_pipeline = pipeline;

  g_object_set(src, "location", argv[2], nullptr);

  g_object_set(videorate, "drop-only", TRUE, "max-rate", 2, nullptr);

  g_object_set(videoconvert, "nvbuf-memory-type", 3, nullptr);

  GstCaps* caps = gst_caps_new_simple("video/x-raw", "format", G_TYPE_STRING, "RGBA", nullptr);
  GstCapsFeatures* features = gst_caps_features_new("memory:NVMM", nullptr);
  gst_caps_set_features(caps, 0, features);
  g_object_set(capsfilter, "caps", caps, nullptr);

  const char* new_streammux = std::getenv("USE_NEW_NVSTREAMMUX");
  if (new_streammux != nullptr && new_streammux == "yes"sv)
                 "attach-sys-ts", TRUE,
                 "batch-size", 1,
                 "sync-inputs", TRUE,
                 "max-latency", 200000000,
                 "num-surfaces-per-frame", 1,
                 "config-file-path", "/tmp/nvstreammux-config.txt",
                 "width", 1920,
                 "height", 1080,
                 "batched-push-timeout", 200000,
                 "live-source", TRUE,
                 "enable-padding", TRUE,
                 "attach-sys-ts", TRUE,
                 "buffer-pool-size", 4,
                 "batch-size", 1,
                 "nvbuf-memory-type", 3,

  g_object_set(infer, "config-file-path", "/tmp/nvinfer-config.txt", nullptr);
               "emit-signals", TRUE,
               "async", FALSE,
               "sync", FALSE,
               "drop", TRUE,
               "wait-on-eos", FALSE,
               "enable-last-sample", FALSE,
               "max-buffers", 10,

  g_signal_connect(decodebin, "pad-added", G_CALLBACK(on_decodebin_pad_added), videorate);

  g_signal_connect(sink, "new-sample", G_CALLBACK(on_new_sample), nullptr);


  GstPad* src_pad = gst_element_get_static_pad(capsfilter, "src");
  assert(src_pad != nullptr);
  GstPad* sink_pad = gst_element_get_request_pad(streammux, "sink_0");
  assert(sink_pad != nullptr);
  assert(gst_pad_link(src_pad, sink_pad) == GST_PAD_LINK_OK);

  assert(gst_element_link_many(src, decodebin, nullptr));
  assert(gst_element_link_many(videorate, videoconvert, capsfilter, nullptr));
  assert(gst_element_link_many(streammux, infer, sink, nullptr));

  GMainLoop* loop = g_main_loop_new(nullptr, FALSE);
  GstBus* bus = gst_pipeline_get_bus(GST_PIPELINE(pipeline));
  guint bus_watch_id = gst_bus_add_watch(bus, bus_call, loop);

  gst_element_set_state(pipeline, GST_STATE_PLAYING);

  std::cout << "Running..." << std::endl;

  std::cout << "Returned, stopping playback" << std::endl;
  gst_element_set_state(pipeline, GST_STATE_NULL);
  std::cout << "Deleting pipeline" << std::endl;

  return 0;
mkdir model
wget -O model/resnet34_peoplenet_pruned.etlt
wget -O model/labels.txt

nvidia-docker container run -it --rm -v $(pwd):/app -u $(id -u):$(id -g) --workdir=/app
USE_NEW_NVSTREAMMUX=yes ./simple-pipeline-test $(pwd)/model video.mp4

The resulting buffer is not CUDA Unified as I specified in nvvideoconvert.
Does new nvstreammux modify the input buffer format? I would expect the buffer to be RGBA+CUDA Unified until the appsink.

Nvstreammux will generate new buffers for batch. So the right way is to convert after nvstreammux.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.