Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson AGX Xavier 64GB Industrial (Auvidea), Jetson AGX Xavier 64GB Industrial (Forecr), Jetson AGX Orin Developer board
• DeepStream Version 6.1.1, 6.2, 6.0.1
• JetPack Version (valid for Jetson only) 5.0.2, 5.1
• TensorRT Version 8.4.1, 8.5.1
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) question/bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
I have a deepstream python application that is similar to the deepstream python apps rtsp-in-rtsp-out application. The application has a pipeline that can handle a variable amount of UDP multicast input streams through the use of source bins containing udpsource, rtpjpegdepay, jpegdecoder and nvvideoconvert elements. These source bins attach to the streammux, which is linked to the pgie (yolov8 nano model using deepstream-yolo implementation) and the pgie links to a NvDCF tracker. The tracker is either linked to a fakesink or a setup similar to the rtsp-in-rtsp-out app to restream the tiled OSD via multicast and RTSP. This application runs stably on an RTX2080, easily managing a throughput of 25FPS per input stream (realtime) for 4 input streams without any stutters and I can achieve a stable throughput of 17 FPS for 4 input streams on a Jetson AGX Orin developer board with Jetpack 5.0.2. When I try to run the same application on a Jetson AGX Xavier device, the FPS is all over the place, sometimes achieving near real-time throughput for 3/4 streams and sometimes almost no throughput on 3/4 streams. I have tested this on 2 different Xavier devices from 2 different manufacturers (Auvidea and Forecr) and with a variety of different settings, such as DLA and VA settings on the pgie, INT8 precision for the PGIE, turning off the restreamed OSD result, different batch size settings on the muxxer and pgie, power mode settings, jetson_clocks.
I am wondering if I’m doing anything wrong here, possibly missing a certain setting or feature that might be key to unlocking optimal performance on these devices. Should it be possible to run an application like this with 4x 960x1280 streams over UDP multicast on a deepstream app with a 640x640 pgie and a tracker on an industrial Xavier device or is it not powerful enough for this? The FPS also seems to be unstable on the Xavier devices when running just 2 input streams.
I’ll post some FPS probe data of all 3 Jetson devices and their tegrastats while running the app below this for reference. The FPS data is achieved using this code with a buffer probe on the pgie (probe on tracker yields similar results).
AGX Xavier 64GB Industrial (Auvidea) - Jetpack 5.0.2 - power mode 30W ALL - batch size 8 - INT8 - 4 streams - RTSP restream off
**PERF: {'stream0': 1.2, 'stream1': 19.98, 'stream2': 0.0, 'stream3': 0.0}
**PERF: {'stream0': 7.59, 'stream1': 17.39, 'stream2': 4.0, 'stream3': 0.8}
**PERF: {'stream0': 5.0, 'stream1': 18.19, 'stream2': 1.4, 'stream3': 0.0}
**PERF: {'stream0': 12.39, 'stream1': 16.39, 'stream2': 6.2, 'stream3': 1.0}
**PERF: {'stream0': 9.39, 'stream1': 17.38, 'stream2': 0.2, 'stream3': 0.2}
**PERF: {'stream0': 7.2, 'stream1': 16.79, 'stream2': 4.2, 'stream3': 0.4}
**PERF: {'stream0': 13.99, 'stream1': 15.39, 'stream2': 3.8, 'stream3': 1.6}
**PERF: {'stream0': 14.39, 'stream1': 16.59, 'stream2': 9.59, 'stream3': 9.79}
**PERF: {'stream0': 16.58, 'stream1': 16.58, 'stream2': 15.98, 'stream3': 16.58}
**PERF: {'stream0': 16.39, 'stream1': 16.39, 'stream2': 15.59, 'stream3': 16.19}
**PERF: {'stream0': 16.39, 'stream1': 16.39, 'stream2': 10.39, 'stream3': 15.99}
**PERF: {'stream0': 16.39, 'stream1': 16.39, 'stream2': 12.59, 'stream3': 15.99}
09-28-2023 12:19:29 RAM 8626/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [37%@1190,32%@1190,12%@1190,16%@1190,14%@1190,21%@1190,29%@1190,16%@1189] EMC_FREQ 38%@1600 GR3D_FREQ 0%@1377 NVENC 115 NVENC1 115 VIC_FREQ 67%@192 APE 150 AUX@46.5C CPU@49.5C thermal@48.6C Tboard@48C AO@46.5C GPU@50C Tdiode@49.25C PMIC@50C GPU 6740mW/6441mW CPU 1007mW/1016mW SOC 2718mW/2586mW CV 0mW/0mW VDDRQ 1607mW/1532mW SYS5V 3995mW/3951mW
09-28-2023 12:19:30 RAM 8626/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [32%@1189,23%@1190,14%@1190,18%@1190,12%@1190,6%@1189,24%@1190,25%@1190] EMC_FREQ 38%@1600 GR3D_FREQ 55%@1377 NVENC 115 NVENC1 115 VIC_FREQ 68%@204 APE 150 AUX@46.5C CPU@49.5C thermal@48.45C Tboard@48C AO@47C GPU@50C Tdiode@49C PMIC@50C GPU 6745mW/6442mW CPU 1006mW/1016mW SOC 2718mW/2586mW CV 0mW/0mW VDDRQ 1607mW/1532mW SYS5V 3995mW/3951mW
09-28-2023 12:19:31 RAM 8626/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [42%@1190,22%@1189,24%@1190,14%@1189,17%@1189,14%@1190,1%@1190,9%@1190] EMC_FREQ 37%@1600 GR3D_FREQ 29%@1377 NVENC 115 NVENC1 115 VIC_FREQ 68%@217 APE 150 AUX@46.5C CPU@49C thermal@48.75C Tboard@48C AO@47C GPU@50.5C Tdiode@49C PMIC@50C GPU 6941mW/6444mW CPU 906mW/1015mW SOC 2718mW/2587mW CV 0mW/0mW VDDRQ 1607mW/1533mW SYS5V 3995mW/3952mW
09-28-2023 12:19:32 RAM 8625/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [35%@1190,27%@1190,23%@1185,22%@1190,14%@1190,14%@1190,10%@1190,12%@1190] EMC_FREQ 38%@1600 GR3D_FREQ 21%@1377 NVENC 115 NVENC1 115 VIC_FREQ 68%@230 APE 150 AUX@46.5C CPU@49.5C thermal@48.9C Tboard@48C AO@47C GPU@50C Tdiode@49C PMIC@50C GPU 6841mW/6446mW CPU 906mW/1015mW SOC 2718mW/2587mW CV 0mW/0mW VDDRQ 1607mW/1533mW SYS5V 4034mW/3952mW
09-28-2023 12:19:33 RAM 8620/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [47%@1190,29%@1192,16%@1190,14%@1190,15%@1190,9%@1190,10%@1185,12%@1190] EMC_FREQ 38%@1600 GR3D_FREQ 0%@1377 NVENC 115 NVENC1 115 VIC_FREQ 71%@217 APE 150 AUX@46.5C CPU@49C thermal@48.45C Tboard@48C AO@47C GPU@50C Tdiode@49C PMIC@50C GPU 6640mW/6446mW CPU 1006mW/1015mW SOC 2718mW/2588mW CV 0mW/0mW VDDRQ 1607mW/1533mW SYS5V 3995mW/3952mW
09-28-2023 12:19:34 RAM 8620/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [50%@1190,27%@1192,19%@1190,13%@1190,16%@1190,21%@1190,22%@1191,20%@1189] EMC_FREQ 38%@1600 GR3D_FREQ 19%@1377 NVENC 115 NVENC1 115 VIC_FREQ 74%@153 APE 150 AUX@46.5C CPU@49C thermal@48.45C Tboard@48C AO@47C GPU@49.5C Tdiode@49C PMIC@50C GPU 6539mW/6447mW CPU 1007mW/1015mW SOC 2718mW/2588mW CV 0mW/0mW VDDRQ 1507mW/1533mW SYS5V 3995mW/3952mW
09-28-2023 12:19:35 RAM 8620/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [37%@1190,22%@1190,16%@1190,14%@1190,14%@1189,20%@1192,18%@1190,23%@1190] EMC_FREQ 38%@1600 GR3D_FREQ 48%@1377 NVENC 115 NVENC1 115 VIC_FREQ 68%@153 APE 150 AUX@46.5C CPU@49C thermal@48.6C Tboard@48C AO@47C GPU@50.5C Tdiode@49C PMIC@50C GPU 6841mW/6449mW CPU 1006mW/1015mW SOC 2718mW/2589mW CV 0mW/0mW VDDRQ 1607mW/1534mW SYS5V 3995mW/3953mW
EMC_FREQ 74%@1600 GR3D_FREQ 65%@1377 VIC_FREQ 34%@115 APE 150 AUX@49C CPU@53C thermal@52.6C Tboard@50C AO@50C GPU@56.5C Tdiode@52C PMIC@50C GPU 13739mW/7450mW CPU 1004mW/1042mW SOC 3110mW/2691mW CV 0mW/0mW VDDRQ 2403mW/1637mW SYS5V 4457mW/4019mW
09-28-2023 12:23:46 RAM 8563/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [33%@1189,17%@1190,39%@1186,15%@1190,42%@1190,5%@1190,13%@1190,16%@1190] EMC_FREQ 75%@1600 GR3D_FREQ 59%@1377 VIC_FREQ 40%@115 APE 150 AUX@49C CPU@52.5C thermal@52.6C Tboard@50C AO@50C GPU@55.5C Tdiode@52.25C PMIC@50C GPU 13447mW/7462mW CPU 1004mW/1042mW SOC 3010mW/2692mW CV 0mW/0mW VDDRQ 2305mW/1639mW SYS5V 4418mW/4020mW
09-28-2023 12:23:47 RAM 8563/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [20%@1189,14%@1192,25%@1190,3%@1187,46%@1190,2%@1189,9%@1189,11%@1189] EMC_FREQ 64%@1600 GR3D_FREQ 84%@1377 VIC_FREQ 34%@115 APE 150 AUX@49C CPU@52.5C thermal@51.85C Tboard@50C AO@49.5C GPU@55.5C Tdiode@51.75C PMIC@50C GPU 10449mW/7469mW CPU 905mW/1042mW SOC 2712mW/2692mW CV 0mW/0mW VDDRQ 1806mW/1639mW SYS5V 4151mW/4020mW
09-28-2023 12:23:48 RAM 8563/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [34%@1190,27%@1189,17%@1190,20%@1190,34%@1189,6%@1190,21%@1190,21%@1189] EMC_FREQ 61%@1600 GR3D_FREQ 70%@1377 VIC_FREQ 36%@115 APE 150 AUX@49C CPU@52.5C thermal@52.3C Tboard@50C AO@50C GPU@55.5C Tdiode@52C PMIC@50C GPU 11748mW/7477mW CPU 1004mW/1042mW SOC 2813mW/2692mW CV 0mW/0mW VDDRQ 2106mW/1640mW SYS5V 4301mW/4021mW
09-28-2023 12:23:49 RAM 8564/63219MB (lfb 12408x4MB) SWAP 0/31610MB (cached 0MB) CPU [29%@1190,40%@1190,9%@1189,6%@1190,19%@1190,26%@1190,16%@1190,24%@1190] EMC_FREQ 58%@1600 GR3D_FREQ 92%@1377 VIC_FREQ 46%@115 APE 150 AUX@49C CPU@52.5C thermal@52C Tboard@50C AO@50C GPU@56C Tdiode@52C PMIC@50C GPU 11153mW/7485mW CPU 1005mW/1042mW SOC 2813mW/2692mW CV 0mW/0mW VDDRQ 2005mW/1641mW SYS5V 4223mW/4021mW
AGX Xavier 64GB Industrial (Forecr) - Jetpack 5.1 - power mode 30W ALL - batch size 8 - INT8 - 4 streams - RTSP restream off
**PERF: {'stream0': 16.79, 'stream1': 16.79, 'stream2': 7.99, 'stream3': 10.79}
**PERF: {'stream0': 17.19, 'stream1': 16.99, 'stream2': 8.59, 'stream3': 12.19}
**PERF: {'stream0': 14.59, 'stream1': 16.79, 'stream2': 6.39, 'stream3': 9.19}
**PERF: {'stream0': 16.79, 'stream1': 16.99, 'stream2': 8.79, 'stream3': 7.39}
**PERF: {'stream0': 16.99, 'stream1': 17.19, 'stream2': 5.6, 'stream3': 10.59}
**PERF: {'stream0': 12.39, 'stream1': 17.19, 'stream2': 0.8, 'stream3': 2.0}
**PERF: {'stream0': 0.8, 'stream1': 19.78, 'stream2': 0.0, 'stream3': 0.6}
**PERF: {'stream0': 12.99, 'stream1': 17.59, 'stream2': 0.4, 'stream3': 0.2}
**PERF: {'stream0': 9.59, 'stream1': 16.79, 'stream2': 5.2, 'stream3': 2.6}
09-28-2023 10:53:25 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [38%@1190,36%@1189,23%@1190,21%@1191,24%@1190,22%@1190,1%@1190,4%@1189] EMC_FREQ 41%@1600 GR3D_FREQ 85%@905 NVENC 115 NVENC1 115 VIC_FREQ 70%@230 APE 150 AUX@47.5C CPU@49C thermal@48.05C Tboard@47C AO@48C GPU@48C Tdiode@49.5C PMIC@50C GPU 3721mW/3656mW CPU 1026mW/1090mW SOC 2823mW/2823mW CV 0mW/0mW VDDRQ 1795mW/1731mW SYS5V 3554mW/3554mW
09-28-2023 10:53:26 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [32%@1191,25%@1190,24%@1190,15%@1190,33%@1190,30%@1185,3%@1190,2%@1190] EMC_FREQ 44%@1600 GR3D_FREQ 1%@905 NVENC 115 NVENC1 115 VIC_FREQ 73%@204 APE 150 AUX@47.5C CPU@49.5C thermal@47.75C Tboard@47C AO@48C GPU@48C Tdiode@49.5C PMIC@50C GPU 3977mW/3763mW CPU 1154mW/1111mW SOC 2951mW/2865mW CV 0mW/0mW VDDRQ 1923mW/1795mW SYS5V 3634mW/3580mW
09-28-2023 10:53:27 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [38%@1190,30%@1190,25%@1191,30%@1190,29%@1189,25%@1190,6%@1190,13%@1190] EMC_FREQ 44%@1600 GR3D_FREQ 18%@905 NVENC 115 NVENC1 115 VIC_FREQ 68%@307 APE 150 AUX@47.5C CPU@49.5C thermal@47.95C Tboard@47C AO@48C GPU@48C Tdiode@49.5C PMIC@50C GPU 4106mW/3849mW CPU 1154mW/1122mW SOC 2951mW/2887mW CV 0mW/0mW VDDRQ 1923mW/1827mW SYS5V 3634mW/3594mW
09-28-2023 10:53:28 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [35%@1190,30%@1189,21%@1190,20%@1190,29%@1189,16%@1188,4%@1190,3%@1190] EMC_FREQ 43%@1600 GR3D_FREQ 32%@905 NVENC 115 NVENC1 115 VIC_FREQ 68%@192 APE 150 AUX@47.5C CPU@49C thermal@48.1C Tboard@47C AO@48C GPU@47.5C Tdiode@49.5C PMIC@50C GPU 3721mW/3823mW CPU 1026mW/1102mW SOC 2823mW/2874mW CV 0mW/0mW VDDRQ 1795mW/1820mW SYS5V 3594mW/3594mW
09-28-2023 10:53:29 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [30%@1188,24%@1192,18%@1190,18%@1190,25%@1190,22%@1190,1%@1190,0%@1189] EMC_FREQ 42%@1600 GR3D_FREQ 44%@905 NVENC 115 NVENC1 115 VIC_FREQ 70%@204 APE 150 AUX@47C CPU@49C thermal@47.95C Tboard@47C AO@48.5C GPU@47.5C Tdiode@49.5C PMIC@50C GPU 3592mW/3784mW CPU 1026mW/1090mW SOC 2823mW/2865mW CV 0mW/0mW VDDRQ 1667mW/1795mW SYS5V 3554mW/3587mW
09-28-2023 10:53:30 RAM 14269/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [32%@1190,27%@1190,27%@1188,20%@1190,27%@1189,21%@1187,6%@1190,17%@1193] EMC_FREQ 41%@1600 GR3D_FREQ 90%@905 NVENC 115 NVENC1 115 VIC_FREQ 67%@243 APE 150 AUX@47.5C CPU@49C thermal@47.75C Tboard@47C AO@48C GPU@48C Tdiode@49.5C PMIC@50C GPU 3464mW/3739mW CPU 1154mW/1099mW SOC 2824mW/2859mW CV 0mW/0mW VDDRQ 1667mW/1776mW SYS5V 3554mW/3582mW
09-28-2023 10:53:31 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [33%@1190,21%@1190,25%@1190,22%@1190,20%@1190,23%@1189,0%@1190,0%@1190] EMC_FREQ 43%@1600 GR3D_FREQ 0%@905 NVENC 115 NVENC1 115 VIC_FREQ 75%@140 APE 150 AUX@47.5C CPU@49C thermal@47.75C Tboard@47C AO@48C GPU@47.5C Tdiode@49.5C PMIC@50C GPU 3721mW/3736mW CPU 1026mW/1090mW SOC 2823mW/2855mW CV 0mW/0mW VDDRQ 1795mW/1779mW SYS5V 3554mW/3579mW
09-28-2023 10:53:32 RAM 14270/63217MB (lfb 6765x4MB) SWAP 0/31608MB (cached 0MB) CPU [36%@1190,33%@1189,27%@1184,12%@1189,23%@1189,28%@1190,5%@1190,6%@1190] EMC_FREQ 42%@1600 GR3D_FREQ 56%@905 NVENC 115 NVENC1 115 VIC_FREQ 73%@217 APE 150 AUX@47C CPU@49C thermal@47.95C Tboard@47C AO@48C GPU@48C Tdiode@49.5C PMIC@50C GPU 3721mW/3735mW CPU 1154mW/1097mW SOC 2823mW/2851mW CV 0mW/0mW VDDRQ 1795mW/1780mW SYS5V 3594mW/3580mW
AGX Orin Developer board - Jetpack 5.0.2 - power mode 30W (8 cores) - batch size 8 - INT8 - 4 streams - RTSP restream off
**PERF: {'stream0': 24.97, 'stream1': 24.97, 'stream2': 25.17, 'stream3': 24.97}
**PERF: {'stream0': 24.98, 'stream1': 24.98, 'stream2': 24.78, 'stream3': 24.78}
**PERF: {'stream0': 24.58, 'stream1': 24.98, 'stream2': 24.98, 'stream3': 24.78}
**PERF: {'stream0': 23.98, 'stream1': 24.98, 'stream2': 24.98, 'stream3': 24.98}
AGX Orin Developer board - Jetpack 5.0.2 - power mode 30W (8 cores) - batch size 8 - INT8 - 4 streams - RTSP restream on
**PERF: {'stream0': 19.78, 'stream1': 19.58, 'stream2': 19.58, 'stream3': 19.38}
**PERF: {'stream0': 18.98, 'stream1': 18.98, 'stream2': 18.98, 'stream3': 18.78}
**PERF: {'stream0': 18.98, 'stream1': 19.18, 'stream2': 18.98, 'stream3': 18.78}
**PERF: {'stream0': 18.99, 'stream1': 18.79, 'stream2': 18.59, 'stream3': 18.39}
**PERF: {'stream0': 19.18, 'stream1': 19.18, 'stream2': 19.18, 'stream3': 18.98}
To me it looks like batch pgie batch size isn’t really the impacting factor, as the spikiness in performance on the Xavier’s seems to occur on batch size 4, 8 and 16 alike across both Xavier devices. RTSP restream on/off does only really seem to impact the Orin device, reducing the FPS per stream by ~ 5 frames. Switching the RTSP restream on/off does not really affect the throughput for both Xavier devices, they remain having spiky throughput that varies a lot.