Rectification is slow

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.5.0.7774
other

Host Machine Version
native Ubuntu 18.04
other


Hello, we have 4 cameras Sekonix SF3324 and I am trying to rectify the images with dwRectifier_warpNvMedia. We are using the pinhole model.

Here is how I warp the images :

bool UndistortWrapper::undistortImage(const dwImageHandle_t& in_image, dwImageNvMedia& out_image)
{
  dwImageNvMedia* in_image_nvmedia;
  CHECK_DW_ERROR_ROS(dwImage_getNvMedia(&in_image_nvmedia, in_image))
  CHECK_DW_ERROR_ROS(dwRectifier_warpNvMedia(&out_image, in_image_nvmedia, rectifier_))

  return true;
}

And here’s how I call it :

  undistortWrapper_->undistortImage(imageHandle_, *image_nvmedia);

But this function is slow and I cannot run it at 30Hz. I looked at the cpu usage with htop and the process is maxing out one thread and processing the images at only 8hz per camera.

I have created a thread pool and I rectify the images in different threads and I now reach 22 Hz but still not 30 Hz.

Am I missing something here ? Why is this function so slow and why is the CPU under so much load. I understood that dwRectifier_warpNvMedia would use the Tegra VIC engine and be pretty fast.

Do you have any idea to improve the speed so we can reach 30 Hz publishing rate ?

Thanks !

Hi @maxandre.ogeret ,

Did you mean dwRectifier_warpNvMedia() is the cause of time consuming and high cpu usage? Could you provide some data of your observation (e.g. your image resolution and how much time the function took)? Thanks.

@VickNV

Did you mean dwRectifier_warpNvMedia() is the cause of time consuming and high cpu usage ?

Yes it’s what I believe. In average the function took 0.015583s to run for one 1980x1208 image.

1/0.015583s = 64.1724956684 which means that in practice less than 2 cameras can be polled and rectified at 30 hz.

Why can your implementation reach 22 Hz for 4 cameras?

@VickNV

I have implemented a thread pool and each camera rectification runs in its own thread. And I get those results :

  • 1 camera : 30 hz
  • 2 cameras : 30 hz
  • 3 cameras : 28hz
  • 4 cameras : 22 hz.

Without threading I can only run one camera at 30hz and I get those results :

  • 1 camera : 30 hz
  • 2 cameras : 14 hz
  • 3 cameras : 12hz
  • 4 cameras : 8 hz.

We run rectification on GPU, but I was thinking about switching a bunch of functionality over to nvmedia for our application, so I’m interested in this as well.

Have you used the Nsight Systems profiler? It can capture trace for nvmedia and will show you what’s eating up the cpu.

Could you share your application for us to reproduce the problem with virtual cameras? Thanks.

When you say you use the GPU I guess it means you are using dwRectifier_warp which uses CUDA images right ? Have you measured the same latency ?

No I haven’t. Which version should I install on the DRIVE AGX board ? Is there any tutorials out there how to install it and how to use it with the Drive AGX boards ?

@VickNV Yes of course. Here is the code :

https://tartuulikool-my.sharepoint.com/:u:/g/personal/ogeret_ut_ee/EYht3QC8QXZLibRd36lxEbgBVmlx8EVKWKY9Jje6UPfCjA?e=A0yqcq

I cannot send you the repo link because it’s not public.

Thanks !

@VickNV Any more information on this ?

@maxandre.ogeret ,
Could you share the detailed steps to reproduce your observation with this package? It will help us to switch to this task easily.