Passing GstBuffer to TensorRT for inferencing

Hi guys,

I built a gst-pipeline in c++ to get gst-buffer from the appsink.
This is my pipeline. (It’s a bit messy, I’ve been meaning to polish it)

For this pipeline:
thetauvcsrc ! h264parse ! nvv4l2decoder ! nvvidconv ! “video/x-raw(memory:NVMM),format=RGBA” ! queue ! nvvidconv ! nvdewarper config-file=dewarp-file.txt ! m.sink_0 nvstreammux width=600 height=480 batch-size=4 num-surfaces-per-frame=4 ! nvmultistreamtiler width=1920 height=960 ! nvvidconv ! “video/x-raw(memory:NVMM),format=RGBA” ! appsink sync=0 emit-singals=true drop=true

Now with the help of Nvidia’s developer forum, I used NvBufsurface APIs + CUDA APIs to map NvBufsurface to cv::cuda::GpuMat.

My question is, how do I use this buffer or image to perform inference directly using a TensorRT engine.
While deepstream does come to mind, the model I am using is Detectron2.

To even create a tensorrt engine for detectron2, the onnx file needs to be created twice and it is frankly quite messy.

I read somewhere that nvvidconv already prepares the gstbuffer for GstNvinfer for inferencing, so I was hoping of doing something on the similar lines.

Since I do have a tensorrt engine now, any advice or suggestions as to how I can use this gstbuffer (or gpumat) for inferencing directly? (gpu → gpu)
Perhaps any links or resources which I can read.

Since I am new to all this, there is a lot of stuff I do not know about but I’d appreciate any help so thanks in advance.


If you have the CUDA buffer pointer (from GpuMat?), you can pass it to TensorRT directly as below:


1 Like

Hi @AastaLLL
Thanks for the reply!
Link seems helpful so I’ll explore it.

I want to confirm about a thing you mentioned though,
“If you have the CUDA buffer pointer (from GpuMat?)”

So what I wanted was to get buffer from the memory:NVMM (gpu) from my gst-pipeline and pass it to TensorRT (gpu) memory directly without any cpu involvement.

Thus I posted a question here yesterday and I got this response, using which I currently have a gpumat.

I wasn’t sure how to use that gpumat to pass it to the TensorRT.

Please refer to the samples:


for feeding NvBufsurface to TensorRT.

1 Like

Ok thanks!
I will read that.

One last thing, do you think NVIVAFILTER would be useful?
From what I read in other posts, NVIVAFILTER is used to wrap nvmm memory with eglimage so that it can be used by cuda.

We would suggest use NvBufSurface APIs since it is more flexible. The nvivafliter plugin is not open source so further debugging may be difficult.

1 Like

Thank you!

Hi. @rajupadhyay59!

Could you please share how were you able to create detectron2 tensorrt engine properly?

following this link on my host (x86) pc

I create the onnx file. (converted.onnx)

Then on my jetson, I used trtexec command to create my engine from the onnx file.


Have you checked our detectron2 sample below:


Thanks for the response.

Yes I have and I generated the engine using that documentation itself, but it is written in the documentation that for good performance, do inference in cpp and/or use deepstream.

With python, i am not sure how to get buffer from nvmm:memory and keep it in gpu memory so that later i can pass it for the inference.

Right now I am referring the and other sample c++ programs and trying to create a cpp inference code for detectron2.

If you ever have any suggestions/advice, please let me know.


You can check jetson_inference which is done with jetson-utils:

Or use Deepstream SDK instead.


Thanks for the reply!

Deepstream is my endgoal to be very honest but I had no idea how to use it with detectron2.

Thanks for the link, I will look into it.

For now I created a custom cpp file which takes a mat image, converts it into a gpuimage image and then does inference and it is working.

So now I know that I am able to do inference with detectron2 using cpp.

My next goal is to understand how to connect my gpuimage I get from eglFrame.frame.pPitch[0] to my inference code. (Any advices or should I just create a new post?)

Also If I understand it correctly, using deepstream, I dont need to store the image from nvmm memory to gpuimage right? it directly passes it to gst-nvinfer?



Suppose you have got the GPU data buffer from eglFrame.frame.pPitch[0].
Then you can follow the jetson_inference to pass the data buffer directly.

For example:

mBindings = (void**)malloc(bindingSize);
for( uint32_t n=0; n < GetInputLayers(); n++ )
    mBindings[mInputs[n].binding] = [input CUDA ptr]

for( uint32_t n=0; n < GetOutputLayers(); n++ )
    mBindings[mOutputs[n].binding] = [output CUDA ptr]	
context ->enqueueV2(mBindings, mStream, NULL) 

Also If I understand it correctly, using deepstream, I dont need to store the image from nvmm memory to gpuimage right? it directly passes it to gst-nvinfer?



@AastaLLL @DaneLLL
Thanks for all the help.
I was able to implement deepstream.

I created a very simple github repo for people who like me would want to look for a way to implement detectron2 with deepstream.

I’ll just link it here so people can find it and I hope it may turn out to be helpful. (I cannot promise I will manage the repo, it is just for fun)

Again, thanks for all the help.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.