Hello,
• Hardware Platform: Jetson Xavier NX
• JetPack Version: 5.0.2
• DeepStream Version: 6.1.1
I have a custom low-level tracker library which requires a BGR cv:Mat
for the underlaying tracker frame processing.
Currently the query is:
NvMOTStatus NvMOT_Query(uint16_t customConfigFilePathSize, char *pCustomConfigFilePath, NvMOTQuery *pQuery)
{
pQuery->computeConfig = NVMOTCOMP_CPU;
pQuery->numTransforms = 1;
pQuery->colorFormats[0] = NVBUF_COLOR_FORMAT_RGBA;
pQuery->memType = NVBUF_MEM_SURFACE_ARRAY;
pQuery->batchMode = NvMOTBatchMode_NonBatch;
pQuery->supportPastFrame = false;
return NvMOTStatus_OK;
}
And frame processing:
NvMOTStatus NvMOTContext::processFrame(const NvMOTProcessParams *params, NvMOTTrackedObjBatch *trackedObjectsBatch)
{
NvMOTTrackedObjList *trackedObjList = &trackedObjectsBatch->list[0];
NvMOTFrame *frame = ¶ms->frameList[0];
NvBufSurfaceParams *bufferParams = frame->bufferList[0];
...
cv::Mat rbgaFrame(bufferParams->height, bufferParams->width, CV_8UC4, bufferParams->mappedAddr.addr[0], bufferParams->pitch);
cv::cvtColor(rbgaFrame, bgrFrame, cv::COLOR_RGBA2BGR);
...
Which works, but way too CPU intensive (mainly cv::cvtColor, for 1280x720 frames at 25fps).
Is there any other way to tackle this? Maybe with NVMOTCOMP_GPU
and somehow make the RGBA to BGR conversion on GPU?
Any help will be greatly appreciated. Thank you!
kesong
November 10, 2023, 3:09am
3
Maybe you need write CUDA kernel for COLOR_RGBA2BGR to offload the CPU loading to GPU.
Hello @kesong ,
It seems OpenCV (CUDA build) provides cv::cuda::cvtColor
:
cv::cuda::GpuMat gpuMat = cv::cuda::GpuMat(bufferParams->height, bufferParams->width, CV_8UC4,
bufferParams->mappedAddr.addr[0], bufferParams->pitch);
cv::cuda::cvtColor(gpuMat, gpuMat2, cv::COLOR_RGBA2BGR);
But to supply it with a cv::cuda::GpuMat
is it required (on Jetson) to manage an EGL image? Can you please provide an example?
Thank you for your support.
kesong
November 13, 2023, 8:05am
5
Hi @kesong ,
My current problem is that the first cv::cuda::cvtColor
call takes 170 ms, while subsequent calls take 4 ms.
I’m calling cudaSetDevice(config.miscConfig.gpuId)
in NvMOTContext’s constructor, I’ve also tried calling cudaFree(0)
to cause some kind of early initialization, but it didn’t help.
What should I do on initialization to eliminate this opencv-cuda-first-call-time issue?
Thank you for your support.
kesong
November 21, 2023, 9:01am
7
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Can you profile application with nsys? Can you share the nsys ouput for analysis?
sudo /opt/nvidia/nsight_systems/nsys profile -t cuda,nvtx,nvmedia,osrt --accelerator-trace=nvmedia --show-output=true --force-overwrite=true --delay=20 --duration=30 --output=%p $APP_WITH_ITS_OPTIONS
system
Closed
December 5, 2023, 9:02am
8
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.