Very bad performance, or is something wrong?

mechamania · April 26, 2023, 3:54pm

Hi,
I managed to do my first experiments. I connected two cameras to the Jetson Nano. Both 1600x1300 at 60fps. I use OpenCV and wrote a loop that just read frames into a buffer (C++). But it doesn’t do anything with that buffer yet.
When running this and checking the system monitor, I see that 3 of the 4 CPUs are for 70% busy! But I’m doing nothing with the frames! CPUs should be mostly idle. Only data is moved and that probably is done with DMA.
I need to write some code that will analyse these frames, but it seems I will not have enough power for that.
Also the GPU is busy (changing between 0 en 80%)! Why?
Finally the 4GB ram is used for 80%! Yes I have big images, but one image is 1600x1300x2 (max) thus 4 MBytes. I assume it is only storing 1 or 2 pictures per channel (otherwise the latency will be very bad.

Is the conclusion that a Jetson Nano is way to weak to handle his??
I was planning to switch to Jetson Orin Nano later, but then with 6 cameras!
I hope I’m doing something wrong and this can be improved a lot.

Honey_Patouceul · April 26, 2023, 4:49pm

Obviously the ARM cores of Nano (derived from TX1) are not as efficient as recent desktop CPU cores.

So you would have to consider that for this amount of pixel rate you would avoid as much as possible CPU processing. With jeston you would have dedicated HW for video scaling, format converting, encoding/decoding and GPU for image processing.

You may better explain your case:

What kind of cameras (CSI or USB), available formats
Share the code you’re using for opencv (opening cameras, reading from these, displaying). Also share the output of opencv function getBuildInformation().
For better advice, tell what kind of processing you intend to do from source to sink.

mechamania · April 26, 2023, 5:38pm

The thing is that I’m not doing any processing at all! CPUs should be idle.

This is the OpenCV code:

using namespace cv;
using namespace std;

int main(int argc, char** argv )
{
    VideoCapture cap1;
    Mat CameraFrame1;
    VideoCapture cap2;
    Mat CameraFrame2;

    cap1.open(0);
    cap2.open(1);

    // Check whether user selected camera is opened successfully.
    if (!cap1.isOpened())
    {
        cout << "***Could not initialize capturing...***\n";
        return -1;
    }
    if (!cap2.isOpened())
    {
        cout << "***Could not initialize capturing...***\n";
        return -1;
    }

    // Loop infinitely to fetch frame from cameras.
    int frame = 0;
    for (;;)
    {
        cap1 >> CameraFrame1;
        cap2 >> CameraFrame2;

        // Check whether received frame has valid pointer.
        if (CameraFrame1.empty())
            break;
        if (CameraFrame2.empty())
            break;

        cout << "frame" << frame << endl;
        frame++;

        // Wait for Escape keyevent to exit from loop
        char keypressed = (char)waitKey(10);
        if (keypressed == 27)
            break;
    }

    cap1.release();
    cap2.release();

Output of getBuildInformation():


General configuration for OpenCV 4.1.1 =====================================
  Version control:               4.1.1-2-gd5a58aa75

  Platform:
    Timestamp:                   2019-12-13T17:25:11Z
    Host:                        Linux 4.9.140-tegra aarch64
    CMake:                       3.10.2
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               Release

  CPU/HW features:
    Baseline:                    NEON FP16
      required:                  NEON
      disabled:                  VFPV3

  C/C++:
    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 7.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--gc-sections  
    Linker flags (Debug):        -Wl,--gc-sections  
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          dl m pthread rt
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python2 python3 stitching ts video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java js
    Applications:                tests perf_tests examples apps
    Documentation:               NO
    Non-free algorithms:         NO

  GUI: 
    GTK+:                        YES (ver 2.24.32)
      GThread :                  YES (ver 2.56.4)
      GtkGlExt:                  NO

  Media I/O: 
    ZLib:                        /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
    WEBP:                        build (ver encoder: 0x020e)
    PNG:                         /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.34)
    TIFF:                        /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.0.9)
    JPEG 2000:                   build (ver 1.900.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES
      avcodec:                   YES (57.107.100)
      avformat:                  YES (57.83.100)
      avutil:                    YES (55.78.100)
      swscale:                   YES (4.8.100)
      avresample:                NO
    GStreamer:                   YES (1.14.5)
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            TBB (ver 2017.0 interface 9107)

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       YES (ver 3.3.4)
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.5.1)

  Python 2:
    Interpreter:                 /usr/bin/python2.7 (ver 2.7.15)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.15+)
    numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.13.3)
    install path:                lib/python2.7/dist-packages/cv2/python-2.7

  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.6.9)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython3.6m.so (ver 3.6.9)
    numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.13.3)
    install path:                lib/python3.6/dist-packages/cv2/python-3.6

  Python (for build):            /usr/bin/python2.7

  Java:                          
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /usr
-----------------------------------------------------------------


[1] + Done                       "/usr/bin/gdb" --interpreter=mi --tty=${DbgTerm} 0<"/tmp/Microsoft-MIEngine-In-bhqcxvwo.wnd" 1>"/tmp/Microsoft-MIEngine-Out-uqaoclor.llt"

What I need to do:
We have a C++ algotithm to detect dots and extract pose information from that. It runs on a 86 CPU (2.5 GHz) within 1ms per frame (on one thread). So I assume on the Jetson Nano I should be able to do that in 16 ms.

My biggest concern now is that CPU are loaded with 70% while they are doing nothing.

Honey_Patouceul · April 26, 2023, 6:06pm

This might use V4L or FFMPEG backend and may result in CPU load. Since your opencv build also supports gstreamer, you may give it a try. You may tell the modes provided by your cameras with:

sudo apt install v4l-utils

v4l2-ctl -d0 --list-formats-ext
v4l2-ctl -d1 --list-formats-ext

You may also boost your jetson NVP model and clocks:

sudo nvpmodel -m 0
sudo jetson_clocks

mechamania · April 26, 2023, 7:05pm

Both cameras only support 8 or 16 bit greyscale. No compression.

ioctl: VIDIOC_ENUM_FMT
	Index       : 0
	Type        : Video Capture
	Pixel Format: 'GREY'
	Name        : 8-bit Greyscale
		Size: Discrete 640x480
			Interval: Discrete 0.006s (180.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.011s (90.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.004s (280.000 fps)
		Size: Discrete 1600x1300
			Interval: Discrete 0.017s (60.000 fps)

	Index       : 1
	Type        : Video Capture
	Pixel Format: 'Y16 '
	Name        : 16-bit Greyscale
		Size: Discrete 640x480
			Interval: Discrete 0.006s (180.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.011s (90.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.004s (280.000 fps)
		Size: Discrete 1600x1300
			Interval: Discrete 0.025s (40.000 fps)

Running

sudo nvpmodel -m 0
sudo jetson_clocks

makes no difference. What do they do?

Question:
Does OpenCV use CPU to copy frame buffers arround? Or is that done using DMA controllers?

Honey_Patouceul · April 26, 2023, 7:38pm

nvpmodel -m 0 would select MAXN profile (max performance)
jetson_clocks would boost CPU, GPU and memory clocks.

Does this work ?

cv::VideoCapture cap1("v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=1600,height=1300,framerate=60/1 ! appsink drop=1", cv::CAP_GRSTREAMER);

# Or
cv::VideoCapture cap1("v4l2src device=/dev/video0 io-mode=2 ! video/x-raw,format=GRAY8,width=1600,height=1300,framerate=60/1 ! appsink drop=1", cv::CAP_GRSTREAMER);

mechamania · April 26, 2023, 7:47pm

Nope,

Both make no difference.

(I assume you ment CAP_GSTREAMER at the end)

mechamania · April 26, 2023, 7:57pm

I tried

cv::VideoCapture cap1("v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=480,framerate=60/1 ! appsink drop=1", cv::CAP_GSTREAMER);

(lower resolution)
to see if it gets better. But the resolution still becomes 1600x1300. So is the line not working at all?

P.S.: where can I find the syntax of this configuration string?

mechamania · April 26, 2023, 8:25pm

I could limit the resolution this way:

    cap1.open(0);
    cap2.open(1);

    cap1.set(cv::CAP_PROP_FRAME_WIDTH, 640);
    cap1.set(cv::CAP_PROP_FRAME_HEIGHT, 480);

    cap2.set(cv::CAP_PROP_FRAME_WIDTH, 640);
    cap2.set(cv::CAP_PROP_FRAME_HEIGHT, 480);

This would result in a factor 8 less pixels.
But now all cores are above 80%!
I’m very confused now.

Note that I used the cap.set() after openening the VideoCapture. Otherwise it did not result in smaller images.
I’m not sure if that is correct.

mechamania · April 30, 2023, 12:22pm

I did find out that there is an unwanted conversion going on from gray scale to rgb. And I am not able to prevent this by setting the format.
But I guess I should make a new thread for this problem.

system · May 16, 2023, 8:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OpenCV application uneven frame times Jetson Xavier NX opencv , performance , opencl	14	2808	January 19, 2022
OpenCV ops blocking for multi-camera capture with Gstreamer Jetson Nano opencv , gstreamer , nvbugs	19	5037	October 15, 2021
CSI Camera Capture Python3 - Jetson Orin Nano - select() timeout Jetson Orin Nano camera	4	143	February 10, 2025
CSI-Camera Raspberry Pi v2 not work on Jetson Nano Jetson Nano	11	13656	October 14, 2021
OpenCV VideoCapture() doesn't work Jetson Nano opencv	6	12683	October 18, 2021
NV Multimedia API with OpenCV Jetson Nano camera , ros , opencv , mmapi	13	3462	October 15, 2021
Jetson Nano 2gb RPI Camera v2.1 Jetson Nano camera , opencv , gstreamer	4	1353	July 20, 2022
VideoCapture fails to open onboard camera L4T 24.2.1 OpenCV 3.1 Jetson TX1	28	23594	October 18, 2021
OpenCV VideoCapture capture one frame take more than 600ms in Jetson Nano Jetson Nano opencv	8	1589	October 18, 2021
JetsonCamera.py exit after 20 seconds Jetson Nano camera , gstreamer	6	581	October 15, 2021

Very bad performance, or is something wrong?

Related topics