Jetson AGX Orin 64 GB source30_1080p_dec_infer-resnet_tiled_display_int8 deepstream expected FPS

For a sample application in deepstream:
source30_1080p_dec_infer-resnet_tiled_display_int8 : Demonstrates 30 stream decodes with primary inferencing

For a file sample_1080p_h264.mp4
Detector only: Resnet18 (960 × 544) Batchsize 30 int8

PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg) FPS 5 (Avg) FPS 6 (Avg) FPS 7 (Avg) FPS 8 (Avg) FPS 9 (Avg) FPS 10 (Avg) FPS 11 (Avg) FPS 12 (Avg) FPS 13 (Avg) FPS 14 (Avg) FPS 15 (Avg) FPS 16 (Avg) FPS 17 (Avg) FPS 18 (Avg) FPS 19 (Avg) FPS 20 (Avg) FPS 21 (Avg) FPS 22 (Avg) FPS 23 (Avg) FPS 24 (Avg) FPS 25 (Avg) FPS 26 (Avg) FPS 27 (Avg) FPS 28 (Avg) FPS 29 (Avg)
PERF: 15.91 (15.60) 15.55 (15.61) 15.76 (15.59) 15.76 (15.59) 15.83 (15.58) 15.62 (15.59) 15.62 (15.61) 15.83 (15.58) 15.82 (15.60) 15.83 (15.61) 15.62 (15.59) 15.66 (15.58) 15.91 (15.60) 15.86 (15.61) 15.62 (15.61) 15.66 (15.61) 15.76 (15.59) 15.83 (15.59) 15.62 (15.61) 15.83 (15.58) 15.62 (15.59) 15.74 (15.62) 15.62 (15.61) 15.83 (15.59) 15.83 (15.61) 15.91 (15.59) 15.91 (15.59) 15.66 (15.58) 15.83 (15.58) 15.62 (15.59)

Tegrastats:

05-27-2024 14:54:31 RAM 12430/62841MB (lfb 2x4MB) SWAP 0/31421MB (cached 0MB) CPU [10%@2201,4%@2201,2%@2201,4%@2201,3%@2201,11%@2201,7%@2201,4%@2201,4%@2201,4%@2201,8%@2201,4%@2201] GR3D_FREQ 47% cpu@45.781C tboard@32.75C soc2@42.468C tdiode@35C soc0@42.812C gpu@43.25C tj@45.781C soc1@41.281C VDD_GPU_SOC 16051mW/16749mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 5966mW/6092mW VDDQ_VDD2_1V8AO 2221mW/2271mW
05-27-2024 14:54:32 RAM 12430/62841MB (lfb 2x4MB) SWAP 0/31421MB (cached 0MB) CPU [12%@2201,4%@2201,4%@2201,6%@2201,7%@2201,11%@2201,6%@2201,4%@2201,4%@2201,4%@2201,9%@2201,7%@2201] GR3D_FREQ 58% cpu@45.843C tboard@32.75C soc2@42.468C tdiode@35.25C soc0@42.906C gpu@43.156C tj@45.875C soc1@41.25C VDD_GPU_SOC 16452mW/16690mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 5966mW/6067mW VDDQ_VDD2_1V8AO 2221mW/2261mW
05-27-2024 14:54:33 RAM 12430/62841MB (lfb 2x4MB) SWAP 0/31421MB (cached 0MB) CPU [10%@2201,5%@2201,4%@2201,5%@2201,8%@2201,12%@2201,3%@2201,4%@2201,4%@2201,4%@2201,13%@2201,3%@2201] GR3D_FREQ 27% cpu@45.937C tboard@32.75C soc2@42.562C tdiode@35.25C soc0@43.031C gpu@43.281C tj@46.062C soc1@41.281C VDD_GPU_SOC 16853mW/16717mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 6168mW/6083mW VDDQ_VDD2_1V8AO 2322mW/2271mW
05-27-2024 14:54:34 RAM 12430/62841MB (lfb 2x4MB) SWAP 0/31421MB (cached 0MB) CPU [11%@2201,4%@2201,4%@2201,3%@2201,7%@2201,10%@2201,3%@2201,4%@2201,6%@2201,4%@2201,5%@2201,13%@2201] GR3D_FREQ 41% cpu@45.875C tboard@32.875C soc2@42.812C tdiode@35.375C soc0@43.187C gpu@44C tj@45.875C soc1@41.343C VDD_GPU_SOC 17248mW/16793mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 6168mW/6095mW VDDQ_VDD2_1V8AO 2322mW/2278mW
05-27-2024 14:54:35 RAM 12430/62841MB (lfb 2x4MB) SWAP 0/31421MB (cached 0MB) CPU [13%@2201,4%@2201,3%@2201,6%@2201,6%@2201,6%@2201,8%@2201,5%@2201,5%@2201,4%@2201,7%@2201,9%@2201] GR3D_FREQ 79% cpu@46C tboard@32.875C soc2@42.687C tdiode@35.25C soc0@43.25C gpu@43.781C tj@46C soc1@41.312C VDD_GPU_SOC 17255mW/16850mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 6168mW/6104mW VDDQ_VDD2_1V8AO 2322mW/2284mW

Now if I run the above 2 times. The fps reduces.

PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg) FPS 5 (Avg) FPS 6 (Avg) FPS 7 (Avg) FPS 8 (Avg) FPS 9 (Avg) FPS 10 (Avg) FPS 11 (Avg) FPS 12 (Avg) FPS 13 (Avg) FPS 14 (Avg) FPS 15 (Avg) FPS 16 (Avg) FPS 17 (Avg) FPS 18 (Avg) FPS 19 (Avg) FPS 20 (Avg) FPS 21 (Avg) FPS 22 (Avg) FPS 23 (Avg) FPS 24 (Avg) FPS 25 (Avg) FPS 26 (Avg) FPS 27 (Avg) FPS 28 (Avg) FPS 29 (Avg)
PERF: 7.80 (7.80) 7.80 (7.78) 7.80 (7.80) 7.80 (7.78) 7.80 (7.79) 7.80 (7.79) 7.82 (7.78) 7.80 (7.79) 7.80 (7.79) 8.00 (7.79) 7.80 (7.79) 7.82 (7.78) 7.80 (7.78) 7.82 (7.78) 7.82 (7.78) 7.80 (7.80) 7.80 (7.80) 7.82 (7.78) 7.80 (7.79) 7.82 (7.78) 7.80 (7.78) 7.80 (7.80) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79) 7.80 (7.79)

The power of the board is set to max.

sudo nvpmodel -m 0
sudo jetson_clocks

Orin video decoder only supports 1x 4K60 | 2x 4K30 | 5x 1080p60 | 11x 1080p30 (H.264) : Source
Can the above be the isssue?

If you want to know whether the video decoder is the bottleneck, you may view the decoder usage with the “Jetson Power GUI” tool.

Thanks for the reply @Fiona.Chen.

Where is the decoder usage in the Power GUI tool? All the listed parameters are present in my tegrastats logs.

Also, What is the expected FPS for the above sample test in deepstream sdk 7.

It is in the red circle.

It depends on the slowest module in the pipeline.
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html#jetson-agx-orin

1 Like

I feel the jetson power gui is just a visualization of tegrastats.
Only thing is to run tegrastats with admin priveleges

For the above test, the bottleneck can be decoding.

05-28-2024 10:07:37 RAM 12449/62841MB (lfb 1x4MB) SWAP 0/31421MB (cached 0MB) CPU [13%@2201,5%@2201,10%@2201,4%@2201,5%@2201,12%@2201,7%@2201,10%@2201,2%@2201,3%@2201,3%@2201,3%@2201] EMC_FREQ 22%@3199 GR3D_FREQ 31%@[1296,1296] NVENC off NVDEC 98%@998 NVJPG off NVJPG1 off VIC 98%@729 OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@46.875C tboard@33.75C soc2@43.593C tdiode@36C soc0@43.843C gpu@46.125C tj@46.781C soc1@42.343C VDD_GPU_SOC 16853mW/16673mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 6168mW/6103mW VDDQ_VDD2_1V8AO 2322mW/2257mW
05-28-2024 10:07:38 RAM 12449/62841MB (lfb 1x4MB) SWAP 0/31421MB (cached 0MB) CPU [11%@2201,5%@2201,6%@2201,3%@2201,6%@2201,5%@2201,2%@2201,11%@2201,7%@2201,5%@2201,4%@2201,4%@2201] EMC_FREQ 23%@3199 GR3D_FREQ 80%@[1300,1294] NVENC off NVDEC 98%@998 NVJPG off NVJPG1 off VIC 95%@729 OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@46.843C tboard@33.75C soc2@43.625C tdiode@36.125C soc0@43.875C gpu@46.25C tj@46.843C soc1@42.281C VDD_GPU_SOC 16452mW/16655mW VDD_CPU_CV 2407mW/2407mW VIN_SYS_5V0 6168mW/6109mW VDDQ_VDD2_1V8AO 2322mW/2263mW

Is the given analysis fine? Any suggestions to improve the numbers or is it the benchmark for the board for this txt file?

Yes.

The AGX Orin performance data is already provided in Performance — DeepStream documentation 6.4 documentation (nvidia.com)

Please read this first: Performance — DeepStream documentation 6.4 documentation (nvidia.com)

The pipeline performance will change according to different configurations since the different hardware resources will be used. But a common rule is that “The performance depends on the slowest module in the pipeline.”

If your requirement is to decoding 30 1080p@30fps H264 videos with one AGX Orin board, it can not be met according to the hardware capability. You may need to change such requirement.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.