Efficiently converting RGBA NvBufSurfaceParams to BGR cv::Mat


• Hardware Platform: Jetson Xavier NX
• JetPack Version: 5.0.2
• DeepStream Version: 6.1.1

I have a custom low-level tracker library which requires a BGR cv:Mat for the underlaying tracker frame processing.

Currently the query is:

NvMOTStatus NvMOT_Query(uint16_t customConfigFilePathSize, char *pCustomConfigFilePath, NvMOTQuery *pQuery)
    pQuery->computeConfig = NVMOTCOMP_CPU;
    pQuery->numTransforms = 1;
    pQuery->colorFormats[0] = NVBUF_COLOR_FORMAT_RGBA;
    pQuery->memType = NVBUF_MEM_SURFACE_ARRAY;
    pQuery->batchMode = NvMOTBatchMode_NonBatch;
    pQuery->supportPastFrame = false;

    return NvMOTStatus_OK;

And frame processing:

NvMOTStatus NvMOTContext::processFrame(const NvMOTProcessParams *params, NvMOTTrackedObjBatch *trackedObjectsBatch)
	NvMOTTrackedObjList *trackedObjList = &trackedObjectsBatch->list[0];
	NvMOTFrame *frame = &params->frameList[0];
	NvBufSurfaceParams *bufferParams = frame->bufferList[0];

	cv::Mat rbgaFrame(bufferParams->height, bufferParams->width, CV_8UC4, bufferParams->mappedAddr.addr[0], bufferParams->pitch);
	cv::cvtColor(rbgaFrame, bgrFrame, cv::COLOR_RGBA2BGR);

Which works, but way too CPU intensive (mainly cv::cvtColor, for 1280x720 frames at 25fps).

Is there any other way to tackle this? Maybe with NVMOTCOMP_GPU and somehow make the RGBA to BGR conversion on GPU?

Any help will be greatly appreciated. Thank you!

Maybe you need write CUDA kernel for COLOR_RGBA2BGR to offload the CPU loading to GPU.

Hello @kesong,

It seems OpenCV (CUDA build) provides cv::cuda::cvtColor:

cv::cuda::GpuMat gpuMat = cv::cuda::GpuMat(bufferParams->height, bufferParams->width, CV_8UC4,
      bufferParams->mappedAddr.addr[0], bufferParams->pitch);
cv::cuda::cvtColor(gpuMat, gpuMat2, cv::COLOR_RGBA2BGR);

But to supply it with a cv::cuda::GpuMat is it required (on Jetson) to manage an EGL image? Can you please provide an example?

Thank you for your support.

Please refer: Implementing a Custom GStreamer Plugin with OpenCV Integration Example — DeepStream 6.3 Release documentation

Hi @kesong,

My current problem is that the first cv::cuda::cvtColor call takes 170 ms, while subsequent calls take 4 ms.

I’m calling cudaSetDevice(config.miscConfig.gpuId) in NvMOTContext’s constructor, I’ve also tried calling cudaFree(0) to cause some kind of early initialization, but it didn’t help.

What should I do on initialization to eliminate this opencv-cuda-first-call-time issue?

Thank you for your support.

Can you profile application with nsys? Can you share the nsys ouput for analysis?

sudo /opt/nvidia/nsight_systems/nsys profile -t cuda,nvtx,nvmedia,osrt --accelerator-trace=nvmedia --show-output=true --force-overwrite=true --delay=20 --duration=30 --output=%p $APP_WITH_ITS_OPTIONS