Torch Inference slows down after a few iterations

Hi,

Here is a very simplified block of code that runs for 30-40 iterations at around 20ms and then jumps to about 200-240ms. The model is based on VGG16 with the final dense layers removed:

x = torch.ones((1, 3, 224, 224)).cuda()
for count in range(1000):
  time1 = time.perf_counter()
  model.forward(x)
  time2 = time.perf_counter()
  print('Time %s' %(time2-time1))

I am running Jetson nano 2GB. The output from jtop is:
Jetpack: 4.6 [L4T 32.6.1]
CUDA: 10.2.300
OpenCV: 4.5.3 compiled CUDA: YES
TensorRT: 8.0.1.6
cuDNN: 8.2.1.32

Hi,

Would you mind monitoring the system status first?

$ sudo tegrastats

If all the GPU resources are occupied after a few interactions, the following inference needs to wait for the resource first.

Thanks.

This is the output of tegrastats wile running the app:

RAM 501/1972MB (lfb 20x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1224,0%@1224,0%@1224,0%@1224] EMC_FREQ 0%@1600 GR3D_FREQ 0%@76 APE 25 PLL@31.5C CPU@35.5C PMIC@50C GPU@33.5C AO@38.5C thermal@34.5C
RAM 501/1972MB (lfb 20x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [1%@102,4%@102,0%@102,0%@102] EMC_FREQ 0%@1600 GR3D_FREQ 0%@76 APE 25 PLL@31.5C CPU@35C PMIC@50C GPU@33.5C AO@38C thermal@34.75C
RAM 501/1972MB (lfb 20x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [2%@102,5%@102,0%@204,0%@204] EMC_FREQ 0%@1600 GR3D_FREQ 0%@76 APE 25 PLL@31.5C CPU@35C PMIC@50C GPU@33.5C AO@38C thermal@34.5C
RAM 501/1972MB (lfb 20x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [4%@102,7%@102,0%@102,0%@102] EMC_FREQ 0%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32C CPU@35.5C PMIC@50C GPU@34C AO@38C thermal@34.75C
RAM 509/1972MB (lfb 20x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [11%@1479,14%@1479,11%@1479,23%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32.5C CPU@37C PMIC@50C GPU@34.5C AO@38.5C thermal@34.5C
RAM 543/1972MB (lfb 19x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [44%@1479,29%@1479,27%@1479,77%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32.5C CPU@36.5C PMIC@50C GPU@34.5C AO@38.5C thermal@35.5C
RAM 555/1972MB (lfb 19x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [60%@1479,29%@1479,1%@1479,1%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32C CPU@37.5C PMIC@50C GPU@34C AO@38.5C thermal@35.75C
RAM 591/1972MB (lfb 19x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [79%@1479,8%@1479,0%@1479,0%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32.5C CPU@36C PMIC@50C GPU@34.5C AO@39C thermal@35.75C
RAM 612/1972MB (lfb 19x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [71%@1479,15%@1479,2%@1479,1%@1479] EMC_FREQ 1%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32C CPU@36C PMIC@50C GPU@34C AO@38.5C thermal@35.25C
RAM 643/1972MB (lfb 15x2MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [87%@1479,3%@1479,4%@1479,0%@1479] EMC_FREQ 2%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32C CPU@36C PMIC@50C GPU@34C AO@39C thermal@35.25C
RAM 779/1972MB (lfb 22x1MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [42%@1479,6%@1479,41%@1479,14%@1479] EMC_FREQ 2%@1600 GR3D_FREQ 32%@76 APE 25 PLL@32.5C CPU@36C PMIC@50C GPU@32C AO@38.5C thermal@34.25C
RAM 934/1972MB (lfb 22x1MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [26%@1479,10%@1479,47%@1479,29%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 40%@76 APE 25 PLL@32.5C CPU@37C PMIC@50C GPU@32.5C AO@39C thermal@34.5C
RAM 1147/1972MB (lfb 22x1MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [41%@1479,10%@1479,27%@1479,29%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@39C thermal@34.25C
RAM 1266/1972MB (lfb 18x1MB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [19%@1479,14%@1479,61%@1479,11%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 39%@76 APE 25 PLL@32.5C CPU@36C PMIC@50C GPU@32.5C AO@39C thermal@34C
RAM 1424/1972MB (lfb 1x512kB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [35%@1479,15%@1479,49%@1479,17%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32C AO@39C thermal@34.5C
RAM 1514/1972MB (lfb 1x512kB) SWAP 145/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [31%@1479,4%@1479,70%@1479,15%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 1%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32C AO@39C thermal@34.25C
RAM 1651/1972MB (lfb 1x512kB) SWAP 162/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [48%@1479,39%@1479,26%@1479,43%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@32.5C CPU@36.5C PMIC@50C GPU@32.5C AO@39C thermal@34.25C
RAM 1837/1972MB (lfb 1x512kB) SWAP 162/9178MB (cached 18MB) IRAM 0/252kB(lfb 252kB) CPU [23%@1479,70%@1479,25%@1479,21%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 39%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@38.5C thermal@35C
RAM 1915/1972MB (lfb 1x512kB) SWAP 222/9178MB (cached 17MB) IRAM 0/252kB(lfb 252kB) CPU [19%@1479,49%@1479,55%@1479,46%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37C PMIC@50C GPU@32.5C AO@39C thermal@34.5C
RAM 1912/1972MB (lfb 1x2MB) SWAP 359/9178MB (cached 15MB) IRAM 0/252kB(lfb 252kB) CPU [51%@1479,55%@1479,55%@1479,45%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33.5C CPU@38.5C PMIC@50C GPU@32.5C AO@39C thermal@35.5C
RAM 1915/1972MB (lfb 1x512kB) SWAP 475/9178MB (cached 8MB) IRAM 0/252kB(lfb 252kB) CPU [31%@1479,21%@1479,67%@1479,84%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37C PMIC@50C GPU@32.5C AO@39.5C thermal@35C
RAM 1924/1972MB (lfb 1x512kB) SWAP 576/9178MB (cached 5MB) IRAM 0/252kB(lfb 252kB) CPU [76%@1479,21%@1479,23%@1479,85%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37.5C PMIC@50C GPU@32.5C AO@39.5C thermal@34.5C
RAM 1915/1972MB (lfb 1x512kB) SWAP 684/9178MB (cached 4MB) IRAM 0/252kB(lfb 252kB) CPU [73%@1479,22%@1479,90%@1479,23%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37C PMIC@50C GPU@33C AO@39.5C thermal@35C
RAM 1914/1972MB (lfb 1x512kB) SWAP 759/9178MB (cached 6MB) IRAM 0/252kB(lfb 252kB) CPU [100%@1479,22%@1479,74%@1479,38%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37.5C PMIC@50C GPU@32.5C AO@39.5C thermal@35C
RAM 1886/1972MB (lfb 1x512kB) SWAP 831/9178MB (cached 9MB) IRAM 0/252kB(lfb 252kB) CPU [63%@1479,4%@1479,44%@1479,66%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33.5C CPU@36.5C PMIC@50C GPU@35.5C AO@39C thermal@36.25C
RAM 1884/1972MB (lfb 1x512kB) SWAP 934/9178MB (cached 11MB) IRAM 0/252kB(lfb 252kB) CPU [70%@1479,14%@1479,93%@1479,8%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 1%@76 APE 25 PLL@33C CPU@37C PMIC@50C GPU@33C AO@39.5C thermal@35C
RAM 1901/1972MB (lfb 1x512kB) SWAP 1021/9178MB (cached 38MB) IRAM 0/252kB(lfb 252kB) CPU [51%@1479,19%@1479,50%@1479,41%@1479] EMC_FREQ 4%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37.5C PMIC@50C GPU@33C AO@39.5C thermal@35.25C
RAM 1899/1972MB (lfb 1x512kB) SWAP 1081/9178MB (cached 88MB) IRAM 0/252kB(lfb 252kB) CPU [70%@1479,41%@1479,25%@1479,22%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36C PMIC@50C GPU@32.5C AO@39C thermal@34.25C
RAM 1874/1972MB (lfb 1x512kB) SWAP 1110/9178MB (cached 89MB) IRAM 0/252kB(lfb 252kB) CPU [14%@1479,46%@1479,25%@1479,48%@1479] EMC_FREQ 2%@1600 GR3D_FREQ 10%@76 APE 25 PLL@32.5C CPU@36.5C PMIC@50C GPU@32.5C AO@39C thermal@34.5C
RAM 1880/1972MB (lfb 1x512kB) SWAP 1119/9178MB (cached 57MB) IRAM 0/252kB(lfb 252kB) CPU [14%@1479,16%@1479,47%@1479,10%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@39C thermal@34.25C
RAM 1904/1972MB (lfb 1x512kB) SWAP 1220/9178MB (cached 80MB) IRAM 0/252kB(lfb 252kB) CPU [21%@1479,43%@1479,28%@1479,44%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@39.5C thermal@34.5C
RAM 1892/1972MB (lfb 3x512kB) SWAP 1234/9178MB (cached 42MB) IRAM 0/252kB(lfb 252kB) CPU [21%@1479,32%@1479,30%@1479,11%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 29%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@33C AO@39.5C thermal@34.5C
RAM 1886/1972MB (lfb 3x512kB) SWAP 1305/9178MB (cached 50MB) IRAM 0/252kB(lfb 252kB) CPU [14%@1479,28%@1479,12%@1479,36%@1479] EMC_FREQ 2%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@39.5C thermal@35C
RAM 1880/1972MB (lfb 3x512kB) SWAP 1386/9178MB (cached 80MB) IRAM 0/252kB(lfb 252kB) CPU [16%@1224,20%@1224,21%@1224,12%@1224] EMC_FREQ 2%@1600 GR3D_FREQ 12%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@32.5C AO@39C thermal@35C
RAM 1895/1972MB (lfb 3x512kB) SWAP 1411/9178MB (cached 30MB) IRAM 0/252kB(lfb 252kB) CPU [16%@1224,18%@1224,36%@1224,18%@1224] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@33C AO@39.5C thermal@35C
RAM 1881/1972MB (lfb 3x512kB) SWAP 1514/9178MB (cached 49MB) IRAM 0/252kB(lfb 252kB) CPU [21%@1479,14%@1479,20%@1479,23%@1428] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@37C PMIC@50C GPU@33C AO@39.5C thermal@35C
RAM 1888/1972MB (lfb 3x512kB) SWAP 1590/9178MB (cached 60MB) IRAM 0/252kB(lfb 252kB) CPU [18%@1479,32%@1479,18%@1479,35%@1479] EMC_FREQ 3%@1600 GR3D_FREQ 0%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@33C AO@39.5C thermal@34.75C
RAM 1887/1972MB (lfb 3x512kB) SWAP 1604/9178MB (cached 24MB) IRAM 0/252kB(lfb 252kB) CPU [17%@1479,25%@1479,16%@1479,21%@1479] EMC_FREQ 2%@1600 GR3D_FREQ 99%@76 APE 25 PLL@33C CPU@36.5C PMIC@50C GPU@33C AO@39.5C thermal@34.75C
RAM 1860/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 97MB) IRAM 0/252kB(lfb 252kB) CPU [30%@1428,32%@1428,8%@1428,35%@1428] EMC_FREQ 8%@1600 GR3D_FREQ 99%@921 APE 25 PLL@33C CPU@37C PMIC@50C GPU@35C AO@39.5C thermal@35.25C
RAM 1855/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 97MB) IRAM 0/252kB(lfb 252kB) CPU [9%@1479,100%@1479,2%@1479,0%@1479] EMC_FREQ 15%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34C CPU@38.5C PMIC@50C GPU@35.5C AO@40C thermal@37C
RAM 1855/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 97MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 19%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34C CPU@38.5C PMIC@50C GPU@35.5C AO@40.5C thermal@37C
RAM 1855/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 97MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 22%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@39C PMIC@50C GPU@36C AO@40.5C thermal@37.5C
RAM 1855/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 97MB) IRAM 0/252kB(lfb 252kB) CPU [2%@1479,100%@1479,1%@1479,0%@1479] EMC_FREQ 25%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@39C PMIC@50C GPU@35.5C AO@40.5C thermal@37.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 25%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34C CPU@39C PMIC@50C GPU@36C AO@40.5C thermal@37.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 27%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@39C PMIC@50C GPU@36C AO@41C thermal@37.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,100%@1479,0%@1479,0%@1479] EMC_FREQ 27%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@39C PMIC@50C GPU@36.5C AO@40.5C thermal@37.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,64%@1479,37%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@38.5C PMIC@50C GPU@36.5C AO@41C thermal@38C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35C CPU@39C PMIC@50C GPU@37C AO@41C thermal@37.75C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@34.5C CPU@39C PMIC@50C GPU@37C AO@41C thermal@37.75C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 29%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35C CPU@38.5C PMIC@50C GPU@36.5C AO@41C thermal@38C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35C CPU@39C PMIC@50C GPU@37C AO@41.5C thermal@38C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 29%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35C CPU@39C PMIC@50C GPU@37.5C AO@42C thermal@38C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,1%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35.5C CPU@39.5C PMIC@50C GPU@37C AO@41.5C thermal@38C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 29%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35.5C CPU@39.5C PMIC@50C GPU@37C AO@41.5C thermal@38.25C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@36C CPU@39.5C PMIC@50C GPU@37.5C AO@42C thermal@38.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [2%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@35.5C CPU@39.5C PMIC@50C GPU@37.5C AO@42C thermal@38.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@36C CPU@39.5C PMIC@50C GPU@37.5C AO@42C thermal@38.5C
RAM 1856/1972MB (lfb 3x512kB) SWAP 1784/9178MB (cached 98MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,100%@1479,0%@1479] EMC_FREQ 28%@1600 GR3D_FREQ 99%@921 APE 25 PLL@36C CPU@39.5C PMIC@50C GPU@37.5C AO@42C thermal@38.75C

Hi,

Based on the log, the app is running out of physical memory. (RAM 1856/1972MB)
And trying to use swap memory, which is much slower. (SWAP 1784/9178MB)

To get better performance, please use a model and batch size that is within Nano’s limitations.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.