Performance on PCI-e USB card

nouuata · May 12, 2021, 9:01pm

I happened to have a 4-channel USB 3.0 PCI-e x4 expansion card SSU_TECH su_u3208.v1. I couldn’t make it run on Linux PC (it supports Windows only), but surprisingly it does run quite well on AGX Xavier.
I have a C++ application using OpenCV VideoCapture() which is well threaded to run and process 4 cameras simultaneously. I plugged 4 USB 2.0 cams AR0144 (1MP), but I get only 35 FPS @ power mode ID 7, 32 FPS @ power mode ID 3 and 60 FPS @ power mode ID 0. Obviously, the CPU frequency is essential, which raises the question for the maximum performance that can be obtained with 4 cams in this setup. For example, can the AGX Xavier run 4 cams AR0234 2MP @ 120 FPS via PCI-e card (assuming the PCI-e x8 Gen4 bandwidth should not be the bottleneck)? Does the MIPI improves the performance significantly (having in mind that CPU power is required) and can it achieve 2MP @ 120 FPS 4 cams? Finally, obviously OpenCV VideoCapture() introduces some overhead. What is the cheapest way to capture in C++?

vidyas · May 13, 2021, 8:43am

Jetson AGX offers the PCIe spec-defined bandwidth. You mentioned that the card is a PCIe x4 card but the info about max speed supported by the card is not given. So, I think based on the requirement and speed supported by the card, it should be obvious once we do the math whether your requirement can be satisfied or not.

nouuata · May 13, 2021, 10:11am

Hi @vidyas

The card supports 5 Gbps max speed. I intentionally didn’t include this, because I know how to do the math, and I assume I can exchange it with a faster card.

Looking at this specific card and setup, the 4 AR0144 cams theoretically consume 2.7Gbps and I don’t understand why I observe 35 FPS @ power mode ID 7 and 32 FPS @ power mode ID 3. Can you elaborate on this please?

Back to my previous questions - shall I observe any difference between the MIPI cams and the PCI-e cams assuming no bottleneck at the PCI-e slot?

What is the fastest way to capture frames in c++?

Finally, this is a technical question - I know from @JerryChang that

there’re two threads for frame captures, it’s a thread to enqueue sensor frame into capture buffer, another thread to dequeue for the user-space.

and I am doing something similar in my code - I have a producer-consumer design pattern populating a circular queue with frames and processing each frame content using a thread pool:

thread masterCameraThread(&Module_Camera::read, ref(masterCamera), ref(headPtr), ref(cvr1), ref(mr1), ref(rr1), ref(pr1), ref(quit));
…
pool.AddJob([&, hptr](size_t taskIndex) mutable{masterCamera.processFrame(hptr);});

Now, the problem is I have 4 cams, which use quite a lot of threads under the limitation of 8 physical threads in the AGX Xavier. Obviously, that might cause performance issues and there is a space for optimization here. I am trying to figure out what’s the best approach I can follow based on how the AGX Xavier works. Can you please advice?

vidyas · May 13, 2021, 1:28pm

I’m sorry but I don’t know much about these ‘power mode ID 7’ and ‘power mode ID 3’?
I’ll loop in someone who knows about the camera stuff.

nouuata · May 13, 2021, 3:04pm

Thanks! I am talking about Power Mode 0 - 7 from Welcome — Jetson Linux<br/>Developer Guide 34.1 documentation

Regarding my last question - perhaps it would be a good idea to update vi5_fops.c like this this:

static int tegra_channel_kthread_capture_dequeue(void *data)
{
        while (1) {
...
                        buf = dequeue_dequeue_buffer(chan);
                        if (!buf)
                                break;

                        processFrame(hptr);

                        vi5_capture_dequeue(chan, buf);
                }

However, I see no synchronization between cams in here - which I do on my side. Can you please advise the best way to implement this syncronization - i.e. trigger all cams together? Thanks!

JerryChang · May 14, 2021, 3:25am

hello nouuata,

jumping into this thread, may I know what’s the actual use-case to have 4-cam synchronization?
it’s suggest to have both hardware and software approaches to achieve the synchronization use-case.
please refer to Keeping camera synchronization at software level - #5 by JerryChang as see-also.
thanks

nouuata · May 14, 2021, 10:45am

Hi JerryChang,

Thanks for your reply! The use case is a high FPS active stereo vision using multiple cameras and light sources.

I don’t have a hardware background, but according to my research on the NVIDIA partners solutions, hardware synchronization reduces the FPS significantly. This is not acceptable for the problem I am solving, so I am looking at a high FPS software synchronization. The goal is each cam to fill up its buffer of frames and then use a software synchronization (or even frame approximation if needed) to recover stereo properties of the objects of interest. Thanks for pointing out to syncSensor and getSensorTimestamp() . I will try to understand how to sync this approach with the light sources and I will raise questions if needed.

The project requires 1 meter cable cameras, so I am trying to stay away from MIPI cameras at the moment unless there is a significant benefit to use MIPI (SerDes) instead of USB. I am a bit concerned if I can make 4 cams @ 120 FPS @ 1MP work on PCI-e due to the observation I mentioned above and I don’t understand why it is happening:

I get only 35 FPS @ power mode (nvpmodel) 7, 32 FPS @ power mode 3 and 60 FPS @ power mode 0 (MAXN)

Also, should I prefer Argus API over Opencv? I am looking for the least overhead.

JerryChang · May 17, 2021, 2:08am

hello nouuata,

you cannot enable Argus API to access USB camera,
please check Camera Architecture Stack. it’s libargus to support only MIPI (bayer) sensors.

may I know which JetPack release you’re using?
please share the command-line, also, what’s the reported frame-rate capability by gstreamer?
further more, you may please have a try to exclude multiple-camera access, for example, can you have better frame-rate result with single camera use-case?
thanks

nouuata · May 17, 2021, 8:07am

Hi JerryChang,

may I know which JetPack release you’re using?

JetPack 4.4.1

please share the command-line, also, what’s the reported frame-rate capability by gstreamer?

v4l2-ctl --set-fmt-video=width=1280,height=720,pixelformat=RG12 --stream-mmap -d /dev/video0 --stream-count=600 --stream-to=v4l2.rggb
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 60.00 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 60.00 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.80 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.85 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.88 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.83 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.85 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.87 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.82 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 59.84 fps

further more, you may please have a try to exclude multiple-camera access, for example, can you have better frame-rate result with single camera use-case?

Yes, switching off cams improves the FPS. For example, two cams can do 60 FPS in any power mode I tried.

The PCIe card can do 5Gbps per channel, so 20 Gbps total. This should be enough for 4x AR0234 @ 2.4MP @ 120 fps unless there is a bottleneck at the CPU which I am trying to figure out.

This is what the code does:

Init 4 threads waiting for notification to retrieve the frame from the corresponding camera
Init thread pool waiting for notification to process frames
while
** main thread grabs frames from all cams and notifies the retrieve threads
** 4 threads retrieve the frame from the corresponding cam and notify main thread
** main thread sends job to process threads to process the retrieved frames and continues

For the purpose of testing processFrame() does nothing currently. I assume I can send the processFrame job to the GPU cores, so that is fine.

The measured performance is:

nvpmodel 3:

all fps: 35.8295 / 60
all fps: 35.8166 / 60
all fps: 35.6761 / 60

nvpmodel 7:

all fps: 57.4713 / 60
all fps: 58.548 / 60
all fps: 58.1058 / 60

nvpmodel 0:

all fps: 59.4884 / 60
all fps: 59.8802 / 60
all fps: 59.8802 / 60

4 threads @ 2188 MHz (nvpmodel -m 7) perform better than 8 threads @ 1200 MHz (nvpmodel -m 3) which raises my concerns.

I am trying to understand why this is happening in first place and potentially figure out if I can obtain 4x 2.4MP @ 120 fps through PCIe USB or I should be looking at MIPI.

JerryChang · May 17, 2021, 8:50am

hello nouuata,

since you’re having 120-fps sensor,
could you please exclude --stream-to options for checking the sensor stream capability.
for example,
v4l2-ctl --set-fmt-video=width=1280,height=720,pixelformat=RG12 --stream-mmap -d /dev/video0 --stream-count=600

nouuata · May 17, 2021, 9:02am

Hi JerryChang,

This is due to the video compression on the camera modules. These modules support MJPG only. I figured it out. Thanks!

One final question - AGX Xavier has a HEVC decoder. Can it be used with USB cams?

DaneLLL · May 18, 2021, 2:09am

Hi,

There is not existing sample for this use-case, but it should work by integrating 12_camera_v4l2_cuda + 00_video_decode. 12_camera_v4l2_cuda demonstrates of capturing MJPEG stream through v4l2 and doing JPEG decoding. 00_video_decode demonstrates video decoding.

Topic		Replies	Views
Jetson AGX Xavier usb port number Jetson AGX Xavier camera , usb	10	2131	October 18, 2021
Xavier AGX Synchronized Capture with 4 Cameras Jetson AGX Xavier camera	31	2456	February 5, 2021
Support 2 or 4 cameras working simultaneously in one MAX9286 Jetson AGX Xavier	20	3944	October 18, 2021
Jetson xavier nx camera IMX477 Jetson Xavier NX camera	13	1570	December 28, 2022
4 CSI-MIPI camera setup with Xavier 8GB - corr_err: discarding frame Jetson AGX Xavier camera	3	1008	October 18, 2021
OpenCV application uneven frame times Jetson Xavier NX opencv , performance , opencl	14	2743	January 19, 2022
Considerable Frame rate drops with dual camera Jetson AGX Xavier	7	964	October 18, 2021
How many cameras can be interfaced to the Jetson AGX? Jetson AGX Xavier	7	3102	October 18, 2021
How to use another mipi port data on Xavier Jetson AGX Xavier camera	29	1057	October 18, 2021
Jetson Xavier with Multiple USB3 Cameras Jetson AGX Xavier camera , usb	14	2336	September 15, 2021

Performance on PCI-e USB card

Related topics