Xavier NX 16 and 4 cameras with jetson-inference - some common questions

When using threading, I get almost the same result - only difference is that GPU is constantly 99% (before it fluctuated near 90%. :)

It means that my pro-position was wrong - about getting better results when using resources in parallel (but i squeezed all out). I lack of cpu and gpu resources and my model is slow.

Dusty, what do you think, is it possible to get my model running about 3ms? It’s 0.0035 sec speed per one inference?

Xavier also has the DLA engines (Deep Learning Accelerators) which can offload models from the GPU, but they aren’t typically faster than GPU (they are optimized for power efficiency) and they may not support all the layers of SSD-Mobilenet. I don’t typically run the DLAs with jetson-inference, but DeepStream does. In reality if you want the maximum performance, you would migrate to DeepStream and use a TAO-trained model.

Other ways to speed up the model: reduce it’s resolution, reduce the number of classes, use INT8, or use a different network architecture (YOLO is common but would require code changes and I don’t think would be significantly faster)

1 Like

thnx for explanation. I’m researching at the moment how i can convert the model to int8 or how i can train it directly to int8. hope that it will run on detectnet :)

it feels (feeling is not too good instrument to judge this but … ) that even with deepstream i don’t get required speed. but could get with int8.

I registered myself to: DLI Instructor-Led Workshop - Fundamentals of Deep Learning [DLIW52079] - maybe from there I’ll get more basic insights.

Until then I try if this helps: intel/neural-compressor: Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance. (github.com).

And if model quantification doesn’t help, i’ll move to x86 hardware or will look into DeepStream and C++.

Can you give me a good hint how to start with TAO detectnet_v2 models? I have dataset in CVAT, so i can export from there dataset do any format. Maybe you know any good examples or tutorials where to start with TAO?

I’d check out these getting started resources for TAO:

It looks like CVAT can export to KITTI data format and TAO can import KITTI format. I’m not an expert on using TAO though, so if you encounter issues there you’ll probably want to open a new topic on the TAO forum.

1 Like

thnx. yes, I understand. Your help was more than enough already, thank you.

I sent them a question. Will let you know.

Any ideas where to get mb1-ssd-LITE pretrained model? ssd1-lite should be better than ssd 1?

by the way, while streaming 4 x 1920x1080@60FPS via encoder, same CPU usage, about 90%. So, i’m thinking to move inference on the server, where i have 1070 cards.

The ones they have for it are on: https://github.com/qfgaohao/pytorch-ssd
It looks like there is mb2-ssd-lite: https://github.com/qfgaohao/pytorch-ssd#mobilenetv2-ssd-lite

Yes, thnx, that i iknow.

A205 can be partially compatible with orin nx/nano, but some features will be limited.
I suggest you to buy directly the new carrier board we launched in April, which is specially designed for orin series.

I try to find out which feature will be limited.

If anyone is interested, I sent CSI camera to PC: video-viewer --bitrate=8000000 csi://0 rtp://10.199.1.13:5000 --output-codec=h265 --input-width=1920 --input-height=1080 --input-rate=60 (but it doesn’t give output 60fps, only 30 if checked)

And on PC ran detectnet: detectnet rtp://@:5000 file://tere.mp4 --headless --input-codec=h265 --batch-size=8 --model=ssd-mobilnet-v2.onnx --output-cvg=scores --output-bbox=boxes --labels=labels.txt

Results: on 1070 8GB network time lowered about 2 times, but preprocess (because of video stream i think) took about 2,5ms, so in total its actually the same as on Jetson Xavier NX 16.

Both models has input size 512.


GST_ARGUS: Available Sensor modes :

GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:

Camera index = 1

Camera mode = 1

Output Stream W = 1920 H = 1080

seconds to Run = 0

Frame Rate = 59.999999

GST_ARGUS: Setup Complete, Starting captures for 0 seconds

GST_ARGUS: Starting repeat capture requests.

CONSUMER: Producer has connected; continuing.

[gstreamer] gstCamera – onPreroll

[gstreamer] gstBufferManager – map buffer size was less than max size (3110400 vs 3110407)

[gstreamer] gstBufferManager recieve caps: video/x-raw, width=(int)1920, height=(int)1080, framerate=(fraction)60/1, format=(string)NV12

[gstreamer] gstBufferManager – recieved first frame, codec=raw format=nv12 width=1920 height=1080 size=3110407

[cuda] allocated 4 ring buffers (3110407 bytes each, 12441628 bytes total)

[cuda] allocated 4 ring buffers (8 bytes each, 32 bytes total)

[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink

[gstreamer] gstreamer message async-done ==> pipeline0

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0

[cuda] allocated 4 ring buffers (6220800 bytes each, 24883200 bytes total)

video-viewer: captured 0 frames (1920x1080)

[cuda] allocated 2 ring buffers (3110400 bytes each, 6220800 bytes total)

[gstreamer] gstEncoder – starting pipeline, transitioning to GST_STATE_PLAYING

Opening in BLOCKING MODE

[gstreamer] gstreamer changed state from NULL to READY ==> udpsink0

[gstreamer] gstreamer changed state from NULL to READY ==> rtph265pay0

[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter3

[gstreamer] gstreamer changed state from NULL to READY ==> encoder

[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter2

[gstreamer] gstreamer changed state from NULL to READY ==> vidconv

[gstreamer] gstreamer changed state from NULL to READY ==> mysource

[gstreamer] gstreamer changed state from NULL to READY ==> pipeline1

[gstreamer] gstreamer changed state from READY to PAUSED ==> rtph265pay0

[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter3

[gstreamer] gstreamer changed state from READY to PAUSED ==> encoder

[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter2

[gstreamer] gstreamer changed state from READY to PAUSED ==> vidconv

[gstreamer] gstreamer stream status CREATE ==> src

[gstreamer] gstreamer changed state from READY to PAUSED ==> mysource

[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline1

[gstreamer] gstreamer message new-clock ==> pipeline1

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> rtph265pay0

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter3

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> encoder

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter2

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> vidconv

[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysource

[gstreamer] gstreamer stream status ENTER ==> src

[gstreamer] gstEncoder – new caps: video/x-raw, width=1920, height=1080, format=(string)I420, framerate=30/1

video-viewer: captured 1 frames (1920x1080)

And while detecting on 2 streams, network time doubled, so while streaming encodiing/decoding vs using network straight on jetson, jetson is better.

Hi @raul.orav, thanks for sharing your updated results and info about the carrier board. Hopefully when that board is updated (or you get the newer one), you could then migrate to Orin and increase the performance that way.

Thnx.

I’m beginning to think that bottleneck is actually bandwidth between IMX477 and Xavier. I’m not sure … I’m confused for a moment :)

I’m thinking, i’ll need: Jetson AGX Orin Carrier Board - DSBOARD-AGX | Forecr.io - something like this. And also should find new cameras - more than 60FPS and at least Full HD resolution and good sensitivity for light.


At the moment I’m thinking about CSI bandwidth limitations. Must confirm this proposition or forget it somehow. This thread is good use-case to study this field.

Clearing things up and understanding how it all works takes time. Internet has a lot of misinformation. So, even before buying new toys I need to figure out what I exactly need.

IMX477 has MIPI 2-lane/4-lane, D-PHY V1.2 connectors and Output Video Format RAW12/10/8, COMP8. With that information I conclude that I need minimum 2x4 lanes or 4x4 lanes - this information doesn’t help me at the moment.

D-PHY V1.2 - MIPI Alliance D-PHY v1.2 specification extends the capabilities of D-PHY high-speed burst to 2.5 Gbits/s per lane. It says to me that for 1 camera even 1 lane is enough.

My calculation “estimation” about bitrate bandwith:

World’s Smallest AI Supercomputer: Jetson Xavier NX | NVIDIA says: 14 lanes (3 x4 + 1 x2 or 6 x2 or 5 x2 + 1 x4) MIPI CSI-2 | D-PHY 1.2 (2.5 Gb/s per pair, total up to 30 Gbps). What’s the “pair”? Let’s skip it for a moment.

Let’s assume for a second that my cameras are “consuming” 2 or 4 lanes per camera, so i’d get 3 cameras x 4 lanes + 1 x 2 lanes - should be enough. Even 1 lane should be enough. But if cameras are sending bandwidth equally on lanes then it should be also fine :)

Multi Cameras on Jetson Xavier NX: Best Camera Multiplexing Solutions for your NX Dev Kit - Arducam says:

All recommendations are based on the official Xavier NX dev kit, The NX module itself is a beast, it has 14 (3×4 or 6×2) MIPI CSI-2 lanes, and with D-PHY v1.2 it is able to handle data at a maximum of 30 Gbps, yet the official dev kit only offers 2 lanes, which means if you want to embrace the MIPI interface in its full glory, you need third party carrier boards.

I’m using this: reComputer Jetson-J20-02x datasheet (robotshop.com) - it should be production version. How to verify it?

description: Computer
product: NVIDIA Jetson Xavier NX Developer Kit
vendor: Unknown
version: Not Specified
serial: 1421022050617
width: 64 bits
capabilities: smbios-3.0.0 dmi-3.0.0 smp cp15_barrier setend swp tagged_addr_disabled
configuration: boot=normal family=Unknown sku=Unknown

Maybe I flashed it with wrong software. Maybe this classifies as devkit. Still, if 2 cameras are working 60FPS then it is at leaast 2x2 lanes, and minimum 2x2 lanes are at least more available.

Baseboard is: A205 Carrier Board for Jetson Nano/Xavier NX - RobotShop.

Maybe it’s about kernel and about “device manager” in .dts form and this configuration but for now give up. Or it is about NX processor - it doesn’t manage it. I don’t know.

So - i can’t confirm and also i can’t refute the hypothesis :) and got more leads…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.