Pytorch with jetpack 4.2 works slowly than 3.3

Hello everyone.
We tried the latest JetPack 4.2 with the sdk manger on a TX2. And we noticed that several pytorch models work slowly (for example mtcnn face detector). I shared the testing code on github : [url]https://github.com/AndreyMaslow/pytorch-tests-jetpacks[/url]

This script measures average prediction time on mtcnn.

Could you please help me understand what the problem is. Thank you very much in-advance!

The results of measuring

Hi,

How do you install pyTorch?
A pyTorch library built on different JetPack version will introduce some issue.
Please help to check if your pyTorch is built on JetPack4.2 first.

Here is one for your reference:
[url]https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/[/url]

Thanks.

Hello AastaLLL.
I used this script for pytorch installation.
https://github.com/AndreyMaslow/pytorch-tests-jetpacks/blob/master/installs.sh
Also i reflashed jetson and used the nvidia’s pre-built wheel

wget https://nvidia.box.com/shared/static/veo87trfaawj5pfwuqvhl6mzc5b55fbj.whl -O torch-1.1.0a0+b457266-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.1.0a0+b457266-cp36-cp36m-linux_aarch64.whl

But i get the same result

Hi,

Sorry for the late reply.

May I know if you have maximized the TX2 performance before the profiling?

sudo jetson_clocks.sh

It’s recommended to monitor your system status with tegrastats at the same time.
This can give some information if the system is really busy or wait for some tasks.

sudo tegrastats

Thanks.

Hello AastaLLL.
Yes. I use jetson_clocks and nvpmodel -m0
This is the tegrastats output on tx2 with jetpack 4.2

RAM 2141/7852MB (lfb 935x4MB) CPU [3%@1931,100%@2029,28%@2026,0%@1932,2%@1932,0%@1937] EMC_FREQ 1%@1866 GR3D_FREQ 3%@1300 APE 150 MTS fg 0% bg 2% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@35.5C thermal@34.9C Tdiode@32.75C VDD_SYS_GPU 384/384 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 573/573 VDD_IN 5723/5723 VDD_SYS_CPU 1842/1842 VDD_SYS_DDR 1299/1299
RAM 2291/7852MB (lfb 934x4MB) CPU [3%@2034,100%@2034,23%@2036,2%@2035,2%@2034,1%@2035] EMC_FREQ 1%@1866 GR3D_FREQ 0%@1300 APE 150 MTS fg 0% bg 1% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@33.5C BCPU@35.5C thermal@34.9C Tdiode@32.75C VDD_SYS_GPU 307/345 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 439/506 VDD_IN 5570/5646 VDD_SYS_CPU 1843/1842 VDD_SYS_DDR 1279/1289
RAM 2431/7852MB (lfb 924x4MB) CPU [2%@1917,100%@2006,31%@2003,2%@1917,0%@1917,3%@1917] EMC_FREQ 1%@1866 GR3D_FREQ 1%@1300 APE 150 MTS fg 0% bg 1% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@35.5C thermal@34.9C Tdiode@32.75C VDD_SYS_GPU 307/332 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 649/553 VDD_IN 5916/5736 VDD_SYS_CPU 1919/1868 VDD_SYS_DDR 1299/1292
RAM 2625/7852MB (lfb 880x4MB) CPU [2%@1995,100%@2036,25%@2035,1%@2034,2%@2035,1%@2034] EMC_FREQ 1%@1866 GR3D_FREQ 1%@1300 APE 150 MTS fg 0% bg 2% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@33.5C BCPU@35.5C thermal@34.9C Tdiode@32.75C VDD_SYS_GPU 307/326 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 267/482 VDD_IN 5416/5656 VDD_SYS_CPU 1843/1861 VDD_SYS_DDR 1318/1298
RAM 2771/7852MB (lfb 843x4MB) CPU [4%@1932,100%@2029,30%@2029,3%@1918,2%@1919,2%@1918] EMC_FREQ 1%@1866 GR3D_FREQ 1%@1300 APE 150 MTS fg 0% bg 2% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@35.5C thermal@34.7C Tdiode@32.75C VDD_SYS_GPU 384/337 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 420/469 VDD_IN 5685/5662 VDD_SYS_CPU 1920/1873 VDD_SYS_DDR 1299/1298
RAM 2914/7852MB (lfb 808x4MB) CPU [2%@2035,99%@2034,27%@2034,2%@2034,2%@2035,2%@2035] EMC_FREQ 2%@1866 GR3D_FREQ 4%@1300 APE 150 MTS fg 1% bg 5% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@33.5C BCPU@35.5C thermal@34.9C Tdiode@32.75C VDD_SYS_GPU 460/358 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 573/486 VDD_IN 6069/5729 VDD_SYS_CPU 1919/1881 VDD_SYS_DDR 1337/1305
RAM 2987/7852MB (lfb 790x4MB) CPU [3%@1998,99%@2034,28%@2034,2%@1998,2%@1987,3%@1962] EMC_FREQ 2%@1866 GR3D_FREQ 0%@1300 APE 150 MTS fg 1% bg 11% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@36C thermal@34.9C Tdiode@33C VDD_SYS_GPU 614/394 VDD_SYS_SOC 844/844 VDD_4V0_WIFI 324/463 VDD_IN 5992/5767 VDD_SYS_CPU 2072/1908 VDD_SYS_DDR 1375/1315
RAM 2987/7852MB (lfb 789x4MB) CPU [8%@1943,58%@2038,70%@2035,4%@2035,6%@2034,4%@2036] EMC_FREQ 3%@1866 GR3D_FREQ 1%@1300 APE 150 MTS fg 2% bg 13% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@35.5C thermal@35.2C Tdiode@33.25C VDD_SYS_GPU 998/470 VDD_SYS_SOC 921/853 VDD_4V0_WIFI 458/462 VDD_IN 6643/5876 VDD_SYS_CPU 1995/1919 VDD_SYS_DDR 1451/1332
RAM 1661/7852MB (lfb 930x4MB) CPU [2%@1995,73%@2034,31%@2035,1%@2034,1%@2034,4%@2010] EMC_FREQ 3%@1866 GR3D_FREQ 0%@1300 APE 150 MTS fg 0% bg 6% PLL@35.5C MCPU@35.5C PMIC@100C Tboard@33C GPU@34C BCPU@35.5C thermal@35.2C Tdiode@33C VDD_SYS_GPU 384/460 VDD_SYS_SOC 844/852 VDD_4V0_WIFI 534/470 VDD_IN 5455/5829 VDD_SYS_CPU 1459/1868 VDD_SYS_DDR 1337/1332
RAM 1661/7852MB (lfb 930x4MB) CPU [0%@2027,14%@2034,16%@2034,1%@2035,1%@2035,0%@2034] EMC_FREQ 2%@1866 GR3D_FREQ 1%@1300 APE 150 MTS fg 0% bg 0% PLL@35C MCPU@35C PMIC@100C Tboard@33C GPU@33.5C BCPU@35C thermal@34.4C Tdiode@32.5C VDD_SYS_GPU 230/437 VDD_SYS_SOC 845/851 VDD_4V0_WIFI 458/469 VDD_IN 4342/5681 VDD_SYS_CPU 845/1765 VDD_SYS_DDR 1222/1321
RAM 1661/7852MB (lfb 930x4MB) CPU [4%@1996,22%@2034,19%@2034,1%@1998,1%@1997,0%@1999] EMC_FREQ 1%@1866 GR3D_FREQ 0%@1300 APE 150 MTS fg 0% bg 0% PLL@35C MCPU@35C PMIC@100C Tboard@33C GPU@33.5C BCPU@35C thermal@34.4C Tdiode@32.5C VDD_SYS_GPU 230/418 VDD_SYS_SOC 845/851 VDD_4V0_WIFI 649/485 VDD_IN 4458/5569 VDD_SYS_CPU 768/1675 VDD_SYS_DDR 1203/1310

Hi,

It looks like your GPU utility is pretty low.

GR3D_FREQ 3%@1300

Not sure if it is blocked by preprocessing or other utility.
Could you profile it with nvprof and share with us first?

Thanks.