Yolov5 slow inference on Jetson Xavier NX16

I have finally gotten my Xavier NX16 up and running and I want to test out my Yolov5 model which I have normally been using on my Windows PC

On windows I have been using:
Python3.9
Torch==1.11.0
Torchvision==0.12.0
No cuda since I am running on intel integrated graphics

On the Xavier NX 16 I am using:
Jetpack4.6(Rev3) as requested by Auvidea for my carrier board (ubuntu18.04 which means python3.6 is default)
Python3.9
Torch==1.11.0 build from source because it was hard to find a wheel for an arch64 processor with cuda and python3.9
Torchvision==0.12.0 build from source for the same reason as above.
Using cuda

My problem is that on the PC I get inference speeds around 0.25-0.75 seconds, but on the Xavier my inference speed is +4 seconds.

When I load my custom model using torch hub, I get the following information:
12:37:11.364 INFO YOLOv5 🚀 v6.1-37-g3f634d4 torch 1.11.0a0+gitbc2c6ed CUDA:0 (Xavier, 15825MiB)

12:37:16.207 INFO Fusing layers…
12:37:17.478 INFO Model Summary: 213 layers, 7015519 parameters, 0 gradients
12:37:17.488 INFO Adding AutoShape…

Any idea why things are so slow?

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, please monitor the device status with the following command to see if GPU resources are fully utilized.

$ sudo tegrastats

Thanks.

@AastaLLL
Thanks, I will check it out and report back with the results.

Edit:
Did what you suggested and It reduced the processing to around two and a half second.
Here is my output from tegrastats. I was not able to freeze it after running the yolo detection, so I don’t know what to look for.

RAM 5672/15825MB (lfb 1331x4MB) SWAP 0/7913MB (cached 0MB) CPU [78%@1907,98%@1907,off,off,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 3%@1109 APE 150 MTS fg 3% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.8C VDD_IN 5741/5741 VDD_CPU_GPU_CV 1950/1950 VDD_SOC 1154/1154
RAM 5882/15825MB (lfb 1327x4MB) SWAP 0/7913MB (cached 0MB) CPU [95%@1907,93%@1907,off,off,off,off] EMC_FREQ 7%@1600 GR3D_FREQ 3%@1109 APE 150 MTS fg 5% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.8C VDD_IN 6060/5900 VDD_CPU_GPU_CV 2149/2049 VDD_SOC 1194/1174
RAM 6068/15825MB (lfb 1320x4MB) SWAP 0/7913MB (cached 0MB) CPU [96%@1907,85%@1907,off,off,off,off] EMC_FREQ 7%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.65C VDD_IN 5940/5913 VDD_CPU_GPU_CV 2070/2056 VDD_SOC 1194/1180
RAM 6176/15825MB (lfb 1317x4MB) SWAP 0/7913MB (cached 0MB) CPU [100%@1907,82%@1907,off,off,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 0% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.65C VDD_IN 5821/5890 VDD_CPU_GPU_CV 2030/2049 VDD_SOC 1154/1174
RAM 6264/15825MB (lfb 1309x4MB) SWAP 0/7913MB (cached 0MB) CPU [99%@1907,100%@1907,off,off,off,off] EMC_FREQ 5%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 0% AO@36C GPU@36.5C PMIC@50C AUX@37C CPU@39.5C thermal@37.65C VDD_IN 5861/5884 VDD_CPU_GPU_CV 2070/2053 VDD_SOC 1154/1170
RAM 6286/15825MB (lfb 1305x4MB) SWAP 0/7913MB (cached 0MB) CPU [100%@1907,69%@1907,off,off,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 0% AO@36C GPU@36.5C PMIC@50C AUX@37C CPU@39C thermal@37.6C VDD_IN 5542/5827 VDD_CPU_GPU_CV 1910/2029 VDD_SOC 1114/1160
RAM 6286/15825MB (lfb 1305x4MB) SWAP 0/7913MB (cached 0MB) CPU [58%@1907,100%@1907,off,off,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 3% bg 26% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.6C VDD_IN 5582/5792 VDD_CPU_GPU_CV 1910/2012 VDD_SOC 1114/1154
RAM 6349/15825MB (lfb 1304x4MB) SWAP 0/7913MB (cached 0MB) CPU [30%@1907,100%@1907,off,off,off,off] EMC_FREQ 3%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 1% bg 16% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.65C VDD_IN 5382/5741 VDD_CPU_GPU_CV 1751/1980 VDD_SOC 1114/1149
RAM 6735/15825MB (lfb 1291x4MB) SWAP 0/7913MB (cached 0MB) CPU [84%@1907,91%@1907,off,off,off,off] EMC_FREQ 7%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 4% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.3C VDD_IN 5940/5763 VDD_CPU_GPU_CV 1990/1981 VDD_SOC 1234/1158
RAM 6771/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [43%@1907,33%@1907,off,off,off,off] EMC_FREQ 7%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 2% bg 7% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 5741/5761 VDD_CPU_GPU_CV 1552/1938 VDD_SOC 1234/1166
RAM 6772/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [44%@1907,41%@1907,off,off,off,off] EMC_FREQ 9%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 2% bg 12% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@38.5C thermal@37.5C VDD_IN 5502/5737 VDD_CPU_GPU_CV 1353/1885 VDD_SOC 1234/1172
RAM 6778/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [55%@1907,86%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 2% bg 9% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.65C VDD_IN 6249/5780 VDD_CPU_GPU_CV 2109/1903 VDD_SOC 1232/1177
RAM 6774/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [54%@1907,100%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 8%@1109 APE 150 MTS fg 2% bg 26% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.8C VDD_IN 6448/5831 VDD_CPU_GPU_CV 2308/1934 VDD_SOC 1232/1181
RAM 6779/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [50%@1907,100%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 2% bg 23% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.65C VDD_IN 6409/5872 VDD_CPU_GPU_CV 2308/1961 VDD_SOC 1232/1185
RAM 6784/15825MB (lfb 1288x4MB) SWAP 0/7913MB (cached 0MB) CPU [45%@1907,32%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 8%@1109 APE 150 MTS fg 1% bg 3% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 5582/5853 VDD_CPU_GPU_CV 1433/1926 VDD_SOC 1234/1188
RAM 6795/15825MB (lfb 1287x4MB) SWAP 0/7913MB (cached 0MB) CPU [97%@1907,52%@1907,off,off,off,off] EMC_FREQ 12%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 2% bg 0% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.65C VDD_IN 6090/5868 VDD_CPU_GPU_CV 1791/1917 VDD_SOC 1271/1193
RAM 6790/15825MB (lfb 1287x4MB) SWAP 0/7913MB (cached 0MB) CPU [67%@1907,79%@1907,off,off,off,off] EMC_FREQ 13%@1600 GR3D_FREQ 8%@1109 APE 150 MTS fg 2% bg 2% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.45C VDD_IN 6130/5883 VDD_CPU_GPU_CV 1870/1914 VDD_SOC 1271/1198
RAM 6787/15825MB (lfb 1287x4MB) SWAP 0/7913MB (cached 0MB) CPU [61%@1907,100%@1907,off,off,off,off] EMC_FREQ 13%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 2% bg 4% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.65C VDD_IN 6329/5908 VDD_CPU_GPU_CV 2030/1921 VDD_SOC 1271/1202
RAM 6882/15825MB (lfb 1287x4MB) SWAP 0/7913MB (cached 0MB) CPU [62%@1907,85%@1907,off,off,off,off] EMC_FREQ 14%@1600 GR3D_FREQ 2%@1109 APE 150 MTS fg 1% bg 5% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.65C VDD_IN 6210/5924 VDD_CPU_GPU_CV 1910/1920 VDD_SOC 1271/1205
RAM 7214/15825MB (lfb 1278x4MB) SWAP 0/7913MB (cached 0MB) CPU [75%@1907,59%@1907,off,off,off,off] EMC_FREQ 14%@1600 GR3D_FREQ 3%@1109 APE 150 MTS fg 2% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37C CPU@39C thermal@37.3C VDD_IN 6170/5936 VDD_CPU_GPU_CV 1870/1918 VDD_SOC 1273/1209
RAM 7478/15825MB (lfb 1269x4MB) SWAP 0/7913MB (cached 0MB) CPU [75%@1907,63%@1907,off,off,off,off] EMC_FREQ 15%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 2% bg 3% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.3C VDD_IN 6807/5977 VDD_CPU_GPU_CV 2348/1938 VDD_SOC 1351/1215
RAM 7492/15825MB (lfb 1268x4MB) SWAP 0/7913MB (cached 0MB) CPU [81%@1907,90%@1907,off,off,off,off] EMC_FREQ 13%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 4% bg 11% AO@36C GPU@36.5C PMIC@50C AUX@37C CPU@39C thermal@37.5C VDD_IN 6369/5995 VDD_CPU_GPU_CV 2109/1946 VDD_SOC 1271/1218
RAM 7490/15825MB (lfb 1268x4MB) SWAP 0/7913MB (cached 0MB) CPU [62%@1907,58%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 2% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@38.5C thermal@37.5C VDD_IN 5582/5977 VDD_CPU_GPU_CV 1433/1924 VDD_SOC 1234/1219
RAM 7490/15825MB (lfb 1268x4MB) SWAP 0/7913MB (cached 0MB) CPU [47%@1907,46%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 2% bg 2% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.15C VDD_IN 5781/5969 VDD_CPU_GPU_CV 1552/1908 VDD_SOC 1234/1219
RAM 7496/15825MB (lfb 1264x4MB) SWAP 0/7913MB (cached 0MB) CPU [95%@1907,92%@1907,off,off,off,off] EMC_FREQ 12%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 5% bg 2% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6409/5987 VDD_CPU_GPU_CV 2070/1915 VDD_SOC 1311/1223
RAM 7494/15825MB (lfb 1263x4MB) SWAP 0/7913MB (cached 0MB) CPU [82%@1907,89%@1907,off,off,off,off] EMC_FREQ 13%@1600 GR3D_FREQ 36%@1109 APE 150 MTS fg 3% bg 15% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6846/6020 VDD_CPU_GPU_CV 2468/1936 VDD_SOC 1311/1226
RAM 7522/15825MB (lfb 1259x4MB) SWAP 0/7913MB (cached 0MB) CPU [90%@1907,95%@1907,off,off,off,off] EMC_FREQ 14%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 3% bg 8% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.3C VDD_IN 6687/6044 VDD_CPU_GPU_CV 2348/1951 VDD_SOC 1311/1229
RAM 7537/15825MB (lfb 1258x4MB) SWAP 0/7913MB (cached 0MB) CPU [97%@1907,99%@1907,off,off,off,off] EMC_FREQ 13%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 4% bg 2% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.3C VDD_IN 6448/6059 VDD_CPU_GPU_CV 2189/1960 VDD_SOC 1271/1231
RAM 7532/15825MB (lfb 1258x4MB) SWAP 0/7913MB (cached 0MB) CPU [41%@1907,38%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 1% bg 4% AO@36C GPU@36.5C PMIC@50C AUX@37C CPU@39C thermal@37.5C VDD_IN 5382/6035 VDD_CPU_GPU_CV 1273/1936 VDD_SOC 1234/1231
RAM 7532/15825MB (lfb 1257x4MB) SWAP 0/7913MB (cached 0MB) CPU [73%@1907,67%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 3% bg 6% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6011/6035 VDD_CPU_GPU_CV 1791/1931 VDD_SOC 1273/1232
RAM 7532/15825MB (lfb 1257x4MB) SWAP 0/7913MB (cached 0MB) CPU [34%@1907,31%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 1% bg 4% AO@36C GPU@36C PMIC@50C AUX@37C CPU@38.5C thermal@37.65C VDD_IN 5342/6012 VDD_CPU_GPU_CV 1194/1907 VDD_SOC 1234/1232
RAM 7538/15825MB (lfb 1257x4MB) SWAP 0/7913MB (cached 0MB) CPU [57%@1907,62%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 6%@1109 APE 150 MTS fg 2% bg 4% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.15C VDD_IN 6249/6020 VDD_CPU_GPU_CV 2030/1911 VDD_SOC 1271/1233
RAM 7532/15825MB (lfb 1256x4MB) SWAP 0/7913MB (cached 0MB) CPU [90%@1907,89%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 4% bg 3% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39C thermal@37.3C VDD_IN 6528/6035 VDD_CPU_GPU_CV 2308/1923 VDD_SOC 1271/1235
RAM 7532/15825MB (lfb 1256x4MB) SWAP 0/7913MB (cached 0MB) CPU [80%@1907,98%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 3% bg 10% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.5C VDD_IN 6568/6051 VDD_CPU_GPU_CV 2348/1936 VDD_SOC 1271/1236
RAM 7532/15825MB (lfb 1255x4MB) SWAP 0/7913MB (cached 0MB) CPU [71%@1907,99%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 3% bg 13% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6647/6068 VDD_CPU_GPU_CV 2428/1950 VDD_SOC 1271/1237
RAM 7534/15825MB (lfb 1255x4MB) SWAP 0/7913MB (cached 0MB) CPU [70%@1907,96%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 3% bg 13% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6488/6079 VDD_CPU_GPU_CV 2308/1960 VDD_SOC 1232/1237
RAM 7532/15825MB (lfb 1254x4MB) SWAP 0/7913MB (cached 0MB) CPU [86%@1907,91%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 0%@1109 APE 150 MTS fg 4% bg 6% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6528/6091 VDD_CPU_GPU_CV 2269/1968 VDD_SOC 1271/1237
RAM 7538/15825MB (lfb 1254x4MB) SWAP 0/7913MB (cached 0MB) CPU [84%@1907,77%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 7%@1109 APE 150 MTS fg 3% bg 8% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.8C VDD_IN 6448/6101 VDD_CPU_GPU_CV 2269/1976 VDD_SOC 1232/1237
RAM 7532/15825MB (lfb 1254x4MB) SWAP 0/7913MB (cached 0MB) CPU [54%@1907,99%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 2% bg 10% AO@36C GPU@36C PMIC@50C AUX@37C CPU@39.5C thermal@37.65C VDD_IN 6488/6111 VDD_CPU_GPU_CV 2308/1984 VDD_SOC 1232/1237
RAM 7532/15825MB (lfb 1253x4MB) SWAP 0/7913MB (cached 0MB) CPU [65%@1907,100%@1907,off,off,off,off] EMC_FREQ 10%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 3% bg 7% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39C thermal@37.8C VDD_IN 6448/6119 VDD_CPU_GPU_CV 2308/1992 VDD_SOC 1234/1237
RAM 7532/15825MB (lfb 1253x4MB) SWAP 0/7913MB (cached 0MB) CPU [73%@1907,100%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 3% bg 9% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.8C VDD_IN 6568/6130 VDD_CPU_GPU_CV 2348/2001 VDD_SOC 1234/1237
^ARAM 7532/15825MB (lfb 1252x4MB) SWAP 0/7913MB (cached 0MB) CPU [82%@1907,85%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 5%@1109 APE 150 MTS fg 3% bg 7% AO@36C GPU@36C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.5C VDD_IN 6568/6140 VDD_CPU_GPU_CV 2348/2009 VDD_SOC 1232/1237
RAM 7532/15825MB (lfb 1252x4MB) SWAP 0/7913MB (cached 0MB) CPU [75%@1907,89%@1907,off,off,off,off] EMC_FREQ 11%@1600 GR3D_FREQ 1%@1109 APE 150 MTS fg 3% bg 8% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39C thermal@37.5C VDD_IN 6488/6149 VDD_CPU_GPU_CV 2269/2015 VDD_SOC 1232/1237
RAM 7533/15825MB (lfb 1251x4MB) SWAP 0/7913MB (cached 0MB) CPU [85%@1907,99%@1907,off,off,off,off] EMC_FREQ 12%@1600 GR3D_FREQ 6%@1109 APE 150 MTS fg 4% bg 5% AO@36C GPU@36.5C PMIC@50C AUX@37.5C CPU@39.5C thermal@37.65C VDD_IN 6727/6162 VDD_CPU_GPU_CV 2388/2024 VDD_SOC 1271/1237

On a side note, I actually load multiple models into my cuda device at the same time, but the detection is not run simultaneously. Can this have an impact on performance?

Hi,

The GPU usage is quite low (~10%), it looks like the application is blocked by other tasks.
The most common block is the camera frame pre-processing with OpenCV.

Have you run the inference with TensorRT?
If not, could you give it a try?

Thanks.

@AastaLLL
I tried, but as you can see in my other thread, building tensorrt on my jetson, broke my python libraries, so I will test it once, I get it resolved.

Hi

Does your application depend on python3.9?
If not, you can try it with the default python3.8 (JetPack 5.0.2).

Thanks.

As described, I use jetpack 4.6.0 rev3 as suggested by auvidea who created the carrier board for my jetson xavier nx16

The problem is not really with which python version I use, it worked fine, but after building and altinstalling the same distribution of python, which I already had installed from the dead-snakes ppk python3.9.14, things got messed up and I can’t seem to reinstall or fix the python3.9.14 connections to the correct directories.

Hi,

Do you have the ONNX format of the YOLOv5 model?
If yes, would you mind testing the TensorRT inference time with the following command first?

$ /usr/src/tensorrt/bin/trtexec --onnx=[model]

Thanks.

Good news @AastaLLL
I finally reinstalled python3.9.14 and built TensorRT to install with a whl.
I exported my YOLOv5 model to .engine format and loaded it with torch hub. My inference time in 20W 6core mode, without jetson_clocks is reduced to 0.24 seconds on a 1260x810 image.

Unfortunatly the TensorRT engine takes up my entire GPU and this has increased the inference time of my other model in the system. Is there a way to limit the amount of memory consumed by TensorRT?

If I understand it correctly, I have to set a memory limit of the engine file when I generate it? What I can see in the python API is that IBuilderConfig has a function called set_memory_pool_limit(), should I just plug that into my engine builder?

Added info:
I installed jetson stats to monitor my gpu usage and as soon as the YOLOv5 detection is done it goes back to 10-25% reserved by the different TensorRT components.
I am unsure how to proceed troubleshooting this behavior, maybe the delay/bottleneck happens when I send my input to the GPU for detection on the second model in the GPU

Added info2:
So I tried running the inference a second time after initializing the script and running the first inference and all of a sudden both models inference time decreased to near 0.1 seconds.
I would count this as my problem being solved, but I would like to understand how and why the GPU needs a warmup run for inference.