Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson Xavier
• DeepStream Version 5.1
• Issue Type( questions, new requirements, bugs) question
I have turned the gstdsexample plugin into a plugin that blurs the input RGBA buffers using a cuda filter. The frames are extracted as cv::cuda::GpuMat
. However, the FPS of the blurring section is too slow. Around 60 FPS for a 25x25 filter on a 1280x720 sample. This is really slower than it should be, since on a normal case, the blurring on a single frame runs on 300 FPS. I don’t know what am I missing here.
Here is how I extract the GpuMat mat and blur it:
static GstFlowReturn blur_frame (GpuBlurPure * gpublurpure, NvBufSurface *input_buf, gint idx){
static guint src_width = GST_ROUND_UP_2((unsigned int)gpublurpure->video_info.width);
static guint src_height = GST_ROUND_UP_2((unsigned int)(gint)gpublurpure->video_info.height);
/* Prepare for getting the frame using egl. */
NvBufSurfaceMapEglImage (input_buf, 0);
CUresult status;
CUeglFrame eglFrame;
CUgraphicsResource pResource = NULL;
cudaFree(0);
/* The intermediate buffer has only one frame. Hence the index is 0 */
status = cuGraphicsEGLRegisterImage(&pResource,
input_buf->surfaceList[idx].mappedAddr.eglImage,
CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
status = cuCtxSynchronize();
/* Get the GPU mat from intermediate buffer's eglframe */
cv::cuda::GpuMat d_mat(src_height, src_width, CV_8UC4, eglFrame.frame.pPitch[0]);
/* Process the Mat or make changes to it.*/
auto single_start = std::chrono::high_resolution_clock::now();
gpublurpure->filter->apply (d_mat, d_mat);
auto single_stop = std::chrono::high_resolution_clock::now();
// The time difference here is about 0.016 seconds!
status = cuCtxSynchronize();
status = cuGraphicsUnregisterResource(pResource);
/* Destroy the EGLImage */
NvBufSurfaceUnMapEglImage (input_buf, 0);
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(single_stop - single_start);
float gpu_process_time = 1 / (duration.count() * 1E-9);
std::cout<<"FPS: "<<gpu_process_time<<std::endl;
std::cout<<"TIME: "<<duration.count() * 1E-9<<std::endl;
return GST_FLOW_OK;
}
Here is how the blur_frame function is being called (in the transfrom_ip() function):
batch_meta = gst_buffer_get_nvds_batch_meta (inbuf);
guint i = 0;
for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
l_frame = l_frame->next){
/* Blur the frame */
blur_frame (gpublurpure, surface, i);
i++;
}
Here is a sample pipeline that works:
gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_720p.h264 ! \
m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvvideoconvert ! \
"video/x-raw(memory:NVMM),format=RGBA" ! gpublurpure ! nvvideoconvert ! x264enc ! filesink location=blurry.h264