Xavier NX 16 and 4 cameras with jetson-inference - some common questions

raul.orav · February 21, 2023, 9:01pm

When using threading, I get almost the same result - only difference is that GPU is constantly 99% (before it fluctuated near 90%. :)

It means that my pro-position was wrong - about getting better results when using resources in parallel (but i squeezed all out). I lack of cpu and gpu resources and my model is slow.

Dusty, what do you think, is it possible to get my model running about 3ms? It’s 0.0035 sec speed per one inference?

dusty_nv · February 21, 2023, 9:57pm

Xavier also has the DLA engines (Deep Learning Accelerators) which can offload models from the GPU, but they aren’t typically faster than GPU (they are optimized for power efficiency) and they may not support all the layers of SSD-Mobilenet. I don’t typically run the DLAs with jetson-inference, but DeepStream does. In reality if you want the maximum performance, you would migrate to DeepStream and use a TAO-trained model.

Other ways to speed up the model: reduce it’s resolution, reduce the number of classes, use INT8, or use a different network architecture (YOLO is common but would require code changes and I don’t think would be significantly faster)

raul.orav · February 21, 2023, 10:54pm

thnx for explanation. I’m researching at the moment how i can convert the model to int8 or how i can train it directly to int8. hope that it will run on detectnet :)

it feels (feeling is not too good instrument to judge this but … ) that even with deepstream i don’t get required speed. but could get with int8.

raul.orav · February 22, 2023, 12:23am

I registered myself to: DLI Instructor-Led Workshop - Fundamentals of Deep Learning [DLIW52079] - maybe from there I’ll get more basic insights.

Until then I try if this helps: intel/neural-compressor: Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance. (github.com).

And if model quantification doesn’t help, i’ll move to x86 hardware or will look into DeepStream and C++.

raul.orav · February 22, 2023, 2:36am

Can you give me a good hint how to start with TAO detectnet_v2 models? I have dataset in CVAT, so i can export from there dataset do any format. Maybe you know any good examples or tutorials where to start with TAO?

dusty_nv · February 22, 2023, 3:47am

I’d check out these getting started resources for TAO:

It looks like CVAT can export to KITTI data format and TAO can import KITTI format. I’m not an expert on using TAO though, so if you encounter issues there you’ll probably want to open a new topic on the TAO forum.

raul.orav · February 22, 2023, 10:06am

thnx. yes, I understand. Your help was more than enough already, thank you.

raul.orav · February 23, 2023, 9:47pm

I sent them a question. Will let you know.

raul.orav · February 26, 2023, 1:53am

Any ideas where to get mb1-ssd-LITE pretrained model? ssd1-lite should be better than ssd 1?

by the way, while streaming 4 x 1920x1080@60FPS via encoder, same CPU usage, about 90%. So, i’m thinking to move inference on the server, where i have 1070 cards.

dusty_nv · February 26, 2023, 3:11am

The ones they have for it are on: https://github.com/qfgaohao/pytorch-ssd
It looks like there is mb2-ssd-lite: https://github.com/qfgaohao/pytorch-ssd#mobilenetv2-ssd-lite

raul.orav · February 26, 2023, 8:52am

Yes, thnx, that i iknow.

raul.orav · February 26, 2023, 11:11am

A205 can be partially compatible with orin nx/nano, but some features will be limited.
I suggest you to buy directly the new carrier board we launched in April, which is specially designed for orin series.

I try to find out which feature will be limited.

raul.orav · February 26, 2023, 1:40pm

If anyone is interested, I sent CSI camera to PC: video-viewer --bitrate=8000000 csi://0 rtp://10.199.1.13:5000 --output-codec=h265 --input-width=1920 --input-height=1080 --input-rate=60 (but it doesn’t give output 60fps, only 30 if checked)

And on PC ran detectnet: detectnet rtp://@:5000 file://tere.mp4 --headless --input-codec=h265 --batch-size=8 --model=ssd-mobilnet-v2.onnx --output-cvg=scores --output-bbox=boxes --labels=labels.txt

Results: on 1070 8GB network time lowered about 2 times, but preprocess (because of video stream i think) took about 2,5ms, so in total its actually the same as on Jetson Xavier NX 16.

Both models has input size 512.

GST_ARGUS: Available Sensor modes :

GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:

Camera index = 1

Camera mode = 1

Output Stream W = 1920 H = 1080

seconds to Run = 0

Frame Rate = 59.999999

GST_ARGUS: Setup Complete, Starting captures for 0 seconds

GST_ARGUS: Starting repeat capture requests.

CONSUMER: Producer has connected; continuing.

[gstreamer] gstCamera – onPreroll

[gstreamer] gstBufferManager – map buffer size was less than max size (3110400 vs 3110407)

[gstreamer] gstBufferManager recieve caps: video/x-raw, width=(int)1920, height=(int)1080, framerate=(fraction)60/1, format=(string)NV12

[gstreamer] gstBufferManager – recieved first frame, codec=raw format=nv12 width=1920 height=1080 size=3110407

[cuda] allocated 4 ring buffers (3110407 bytes each, 12441628 bytes total)

[cuda] allocated 4 ring buffers (8 bytes each, 32 bytes total)

[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink

[gstreamer] gstreamer message async-done ==> pipeline0

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0

[cuda] allocated 4 ring buffers (6220800 bytes each, 24883200 bytes total)

video-viewer: captured 0 frames (1920x1080)

[cuda] allocated 2 ring buffers (3110400 bytes each, 6220800 bytes total)

[gstreamer] gstEncoder – starting pipeline, transitioning to GST_STATE_PLAYING

Opening in BLOCKING MODE

[gstreamer] gstreamer changed state from NULL to READY ==> udpsink0

[gstreamer] gstreamer changed state from NULL to READY ==> rtph265pay0

[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter3

[gstreamer] gstreamer changed state from NULL to READY ==> encoder

[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter2

[gstreamer] gstreamer changed state from NULL to READY ==> vidconv

[gstreamer] gstreamer changed state from NULL to READY ==> mysource

[gstreamer] gstreamer changed state from NULL to READY ==> pipeline1

[gstreamer] gstreamer changed state from READY to PAUSED ==> rtph265pay0

[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter3

[gstreamer] gstreamer changed state from READY to PAUSED ==> encoder

[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter2

[gstreamer] gstreamer changed state from READY to PAUSED ==> vidconv

[gstreamer] gstreamer stream status CREATE ==> src

[gstreamer] gstreamer changed state from READY to PAUSED ==> mysource

[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline1

[gstreamer] gstreamer message new-clock ==> pipeline1

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> rtph265pay0

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter3

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> encoder

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter2

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> vidconv

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysource

[gstreamer] gstreamer stream status ENTER ==> src

[gstreamer] gstEncoder – new caps: video/x-raw, width=1920, height=1080, format=(string)I420, framerate=30/1

video-viewer: captured 1 frames (1920x1080)

raul.orav · February 26, 2023, 1:46pm

And while detecting on 2 streams, network time doubled, so while streaming encodiing/decoding vs using network straight on jetson, jetson is better.

dusty_nv · February 27, 2023, 4:00pm

Hi @raul.orav, thanks for sharing your updated results and info about the carrier board. Hopefully when that board is updated (or you get the newer one), you could then migrate to Orin and increase the performance that way.

raul.orav · February 27, 2023, 11:33pm

Thnx.

I’m beginning to think that bottleneck is actually bandwidth between IMX477 and Xavier. I’m not sure … I’m confused for a moment :)

I’m thinking, i’ll need: Jetson AGX Orin Carrier Board - DSBOARD-AGX | Forecr.io - something like this. And also should find new cameras - more than 60FPS and at least Full HD resolution and good sensitivity for light.

At the moment I’m thinking about CSI bandwidth limitations. Must confirm this proposition or forget it somehow. This thread is good use-case to study this field.

Clearing things up and understanding how it all works takes time. Internet has a lot of misinformation. So, even before buying new toys I need to figure out what I exactly need.

IMX477 has MIPI 2-lane/4-lane, D-PHY V1.2 connectors and Output Video Format RAW12/10/8, COMP8. With that information I conclude that I need minimum 2x4 lanes or 4x4 lanes - this information doesn’t help me at the moment.

D-PHY V1.2 - MIPI Alliance D-PHY v1.2 specification extends the capabilities of D-PHY high-speed burst to 2.5 Gbits/s per lane. It says to me that for 1 camera even 1 lane is enough.

My calculation “estimation” about bitrate bandwith:

World’s Smallest AI Supercomputer: Jetson Xavier NX | NVIDIA says: 14 lanes (3 x4 + 1 x2 or 6 x2 or 5 x2 + 1 x4) MIPI CSI-2 | D-PHY 1.2 (2.5 Gb/s per pair, total up to 30 Gbps). What’s the “pair”? Let’s skip it for a moment.

Let’s assume for a second that my cameras are “consuming” 2 or 4 lanes per camera, so i’d get 3 cameras x 4 lanes + 1 x 2 lanes - should be enough. Even 1 lane should be enough. But if cameras are sending bandwidth equally on lanes then it should be also fine :)

Multi Cameras on Jetson Xavier NX: Best Camera Multiplexing Solutions for your NX Dev Kit - Arducam says:

All recommendations are based on the official Xavier NX dev kit, The NX module itself is a beast, it has 14 (3×4 or 6×2) MIPI CSI-2 lanes, and with D-PHY v1.2 it is able to handle data at a maximum of 30 Gbps, yet the official dev kit only offers 2 lanes, which means if you want to embrace the MIPI interface in its full glory, you need third party carrier boards.

I’m using this: reComputer Jetson-J20-02x datasheet (robotshop.com) - it should be production version. How to verify it?

description: Computer
product: NVIDIA Jetson Xavier NX Developer Kit
vendor: Unknown
version: Not Specified
serial: 1421022050617
width: 64 bits
capabilities: smbios-3.0.0 dmi-3.0.0 smp cp15_barrier setend swp tagged_addr_disabled
configuration: boot=normal family=Unknown sku=Unknown

Maybe I flashed it with wrong software. Maybe this classifies as devkit. Still, if 2 cameras are working 60FPS then it is at leaast 2x2 lanes, and minimum 2x2 lanes are at least more available.

Baseboard is: A205 Carrier Board for Jetson Nano/Xavier NX - RobotShop.

Maybe it’s about kernel and about “device manager” in .dts form and this configuration but for now give up. Or it is about NX processor - it doesn’t manage it. I don’t know.

So - i can’t confirm and also i can’t refute the hypothesis :) and got more leads…

system · March 13, 2023, 11:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
EGL display error and execute: 751 Failed to create CaptureSession Jetson Nano camera , gstreamer	6	1723	July 27, 2022
Unable to rotate video stream in Jetson Inference Jetson Nano camera	8	96	August 13, 2024
Jetson Inference startup optimization on Jetson Nano Jetson Nano camera , jetson-inference , cudnn , jetson	4	17	April 24, 2025
Multi-Cam Image Capturing Gstreamer Jetson Xavier Jetson AGX Xavier	11	3935	October 18, 2021
Issue Running detectnet.py with CSI Camera on Jetson AGX Orin Jetson AGX Orin camera , jetson-inference , gstreamer	8	50	April 9, 2025
Gstreamer reports Raspberry Pi camera streaming at 120fps when in reality it is only 60fps Jetson Nano camera , gstreamer	53	3846	July 19, 2022
Jetson NANO and USB 5.8G UVC Camera Receiver Jetson Nano	12	1507	October 14, 2021
Jetson Xavier yuv422 camera driver [one camera] no reply from camera processor Jetson AGX Xavier camera	35	3533	October 17, 2021
Some basic problems with nvarguscamerasrc plugin for gstreamer Jetson AGX Xavier camera	15	1927	January 26, 2024
jetson.utils.videoSource.Capture() latency issue Jetson Xavier NX jetson-inference	14	3473	October 18, 2021

Xavier NX 16 and 4 cameras with jetson-inference - some common questions

Related topics