Lower FPS NVIDIA Jetson Nano

One of the python wrappers of the April tag library that I am using for this benchmark test. https://github.com/duckietown/lib-dt-apriltags, https://github.com/AprilRobotics/apriltag

Intel NUC i3 CPU Results:

  1. 30 FPS Apriltag not in FOV
  2. 30 FPS Apriltag in FOV

Nvidia Jetson nano:

  1. 22 FPS Apriltag not in FOV
  2. 5 FPS Apriltag in FOV

the code is the same.

Hi,
Please execute sudo tegrastats and check the system status in running the application. You can check tegrastats Utility in
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/AppendixTegraStats.html#

If the applications are CPU-based, probably the bottleneck is CPU capability of Jetson Nano.

My example code using CPU, not GPU,

So my question is, how can I utilize the 128 core Cuda GPU for apriltag detection python wrapper?

Hi,
An optimal solution for running deep learning inference is DeepStream SDK. Please check
NVIDIA Metropolis Documentation

By default it demonstrates models such as ResNet10, Yolo,… We suggest try the default sample and then apply yours.

Thank you for the suggestions but I think the Apriltag detection code is a simple CV and Math base task, so how DeepStream Library is useful in such case?

Hi,
You can check tegrastats to make sure the bottleneck is CPU. If yes, a possible solution is to port the function via CUDA to use GPU engine.

Or other users have ever run the samples on Jetson platforms, and can share other suggestions.

Output of tegrastats:

RAM 2692/3956MB (lfb 3x2MB) SWAP 486/1978MB (cached 55MB) IRAM 0/252kB(lfb 252kB) CPU [21%@1224,21%@1224,12%@1224,7%@1224] EMC_FREQ 8%@1600 GR3D_FREQ 0%@153 NVJPG 627 VIC_FREQ 9%@627 APE 25 PLL@26C CPU@28C PMIC@100C GPU@25C AO@35.5C thermal@26.5C POM_5V_IN 3059/3059 POM_5V_GPU 80/80 POM_5V_CPU 724/724

Hi MaharshiOza,
What resolution are you using for camera and how your pipeline is configure?

Cam: Logitech C930E
Resolution: 1920 x 1080 MJPG Compressed

Pipeline Configuration : v4l2src device=/dev/video1 io-mode=2 ! image/jpeg,width=1920,height=1080,framerate=30/1,format=MJPG ! jpegdec ! videoconvert ! video/x-raw,format=BGR ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

Does the code use OpenCV?
IF yes, check if you have CUDA enabled OpenCV on your Jetson Nano.

Yes, I am working on the same, i will post results soon.

Tested with Cuda enabled openCV but the results are the same.

Photo for the reference, that OpenCV installed properly with Cuda enabled.

Output with Jetson Clocks,

In the image, I realized that GPU still showing 0% of usage.

Image:

Deploying OpenCV with CUDA does not make your application automagically use the GPU. Did you adapt your CV2 program code to utilize GPU, e.g. using cuda::GpuMat instead of Mat, setting DNN backend/target to CUDA, …)?

Can you suggest how can I implement it in the following codes?

  1. https://github.com/duckietown/lib-dt-apriltags (Python wrapper)
  2. https://github.com/AprilRobotics/apriltag (C++ Code)

I think you don’t need to change the Python wrapper. its been a while since i have coded in C++ so i can‘t help you here, sorry.
Maybe this helps: Get started with OpenCV CUDA C++ · GitHub

Great, Thank you.

Cuda and Opencv both installed properly but showing this error.

cudaTest.cpp:7:10: fatal error: opencv2/cudaarithm.hpp: No such file or directory
 #include <opencv2/cudaarithm.hpp>
          ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

You must rewrite your OpenCV code to use OpenCV’s cuda functionalty (eg. cv::foo(image) becomes cv::cuda::foo(image) or use DeepStream to do what you want. OpenCV is probably never going to be as performant as DeepStream on Tegra since it often require a bunch of copies between CPU and GPU to get anything done. cv::cuda just lacks a bunch of basic stuff.

You will have to build it since CUDA support is not built into the OpenCV build provided on Tegra. If you do opencv_version --verbose you’ll probably see NVIDIA CUDA: NO. Best advice I can give is not to use OpenCV on Tegra at all if you can. It only performs well on powerful x86 CPUs or doing very simple things like blob detection on arm.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.