TX2 running slow

chrisak7r2 · March 29, 2019, 1:57pm

Hi there,

I have flashed around 10 TX2s in the past and all ran at expected speeds. These were bought as part of dev boards. I have recently come to flash 4 TX2 modules bought as modules only (no dev board) and all 4 of them are running slow for some reason I have not been able to decipher. These are used with Orbitty carrier boards. They are running 4 to 5x slower than the ones bought as part of devkits.

nvpmodel is 0 or MAXN

jetson_clocks.sh is ran on system start.

It is nothing to do with our software as I experience the same slow down running CUDA samples.

Do you have any ideas? Thanks for any help. Happy to provide whatever is needed as I really need to get this sorted.

linuxdev · March 29, 2019, 4:51pm

“jetson_clocks.sh” will fail if you run it too early on boot. Have you tried manually running “nvpmodel -m 0” and “jetson_clocks.sh” again after perhaps a 30 second wait beyond boot completing?

chrisak7r2 · March 29, 2019, 4:57pm

No change I’m afraid.

JerryChang · April 1, 2019, 5:44am

hello chrisak7r2,

regarding to below

“all 4 of them are running slow for some reason”

could you please…

had you confirm all of them flashing with same image.
may I know what’s the application you used to evaluate the system performance.
could you please share the side-by-side comparison results?
you may also execute TegraStats Utility to monitor/compare the hardware usage.
thanks

chrisak7r2 · April 1, 2019, 11:46am

Hi Jerry, thanks for your reply.

Yes all the same image. There could possibly be some issue with the image. Though I believe it has never been changed in any way. Standardly flashed in using “/Linux_for_Tegra_tx2/flash.sh orbitty mmcblk0p1”. And then CUDA etc. is added using JetPack. It must be an issue with this step or before as I am seeing slowness in the low level CUDA examples???
So 2 fold. Our internal application, CNN inference of images. And probably more useful to you, CUDA examples. Particularly the particles one (NVIDIA_CUDA-8.0_Samples/5_Simulations/particles). There is an FPS counter in the top bar on Ubuntu. This runs at approximately 100-150fps compared to a device I flashed a while ago that runs at roughly 600fps.
Happy to provide any side by side evaluation if you guys know of any, my method is a little quick and dirty atm. But I believe it to be accurate.
Have reviewed Tegrastats and doesn’t appear to be too much difference. Perhaps CPU usage is a little higher on the slower one. Generally of the 6 cpu cores showing in tegrastats only 2 max when using my application. RAM sits roughly equally.

Totally open to any suggestions.

Thanks.

JerryChang · April 2, 2019, 3:43am

hello chrisak7r2,

we would like to check this internally, could you please share the hw revision of these TX2 modules.
or, please execute below command for two different TX2 and share the message to us. thanks

$ dmesg | grep "tegra-id\|DTS"

chrisak7r2 · April 2, 2019, 9:45am

nvidia@tegra-ubuntu:~$ dmesg | grep “tegra-id|DTS”
[ 0.008793] tegra-id: chipid=21817.
[ 0.012583] tegra-id: opt_subrevision=1.
[ 0.541862] DTS File Name: /usr/parker/linux4tegra/branches/27.1/Linux_for_Tegra_tx2/sources/kernel_source/kernel-4.4/arch/arm64/boot/dts/…/…/…/…/…/…/hardware/nvidia/platform/t18x/quill/kernel-dts/tegra186-tx2-cti-ASG001-base.dts

nvidia@tegra-ubuntu:~$ dmesg | grep “tegra-id|DTS”
[ 0.008930] tegra-id: chipid=21817.
[ 0.012760] tegra-id: opt_subrevision=1.
[ 0.571753] DTS File Name: /usr/parker/linux4tegra/branches/27.1/Linux_for_Tegra_tx2/sources/kernel_source/kernel-4.4/arch/arm64/boot/dts/…/…/…/…/…/…/hardware/nvidia/platform/t18x/quill/kernel-dts/tegra186-tx2-cti-ASG001-base.dts

Seems exactly the same to me.

JerryChang · April 3, 2019, 2:05am

hello chrisak7r2,

it seems same hardware revision.
however, you bring my attention that you’re still working at release-27.1

since we already release JetPack-4.2 (l4t-r32.1) access: [url]https://developer.nvidia.com/embedded/jetpack[/url]
or, L4T driver package (BSP only/l4t-r28.3) access: [url]https://developer.nvidia.com/embedded/downloads[/url]
could you please move to latest release to verify your use-case.
thanks

chrisak7r2 · April 3, 2019, 9:52am

Thanks Jerry, I have to switch focus for a couple of weeks but will update after I have done this.

Dourado · April 5, 2019, 1:00am

Hey Jerry, I’m experience some performance loss when I moved my TX1 module to an Orbitty Carrier. Could you help me with it?
I’m using Jetpack 3.3 (L4T r28.2) with CTI-L4T-V020.

Here’s my dmesg for tegra id:

[    0.025358] tegra-id: chipid=22117.
[    0.025390] tegra-id: opt_subrevision=0.
[    0.026769] DTS File Name: tegra210-tx1-cti-ASG003.dts
[    0.169031] DTS File Name: tegra210-tx1-cti-ASG003.dts

Thank you.

cobrien · April 5, 2019, 6:19pm

Are you running “jetson_clocks.sh” (or something similar) as part of your software
startup process?

We had a problem where we were configuring the clocks in our startup, but sometimes
(about 1 out of 5 or 10 starts) nvpmodel.service would run AFTER our setup and
change them back. Took a while to figure out. Fixed with

systemctl disable nvpmodel.service

We are also looking at slowdown due to heat issues.

Dourado · April 8, 2019, 6:19pm

Even when I manually set jetson clocks after boot, it doesn’t has the same performance on Orbitty Carrier compared to DevKit.

Also, I’m experiencing a buffer overflow problem which I’m trying to find the source when I run my work in progress software. The same source code doesn’t overflow with the DevKit.

JerryChang · April 12, 2019, 3:13am

hello Dourado,

it seems you got issues on your own carrier board.
suggest you refer to Adaptation Guide for your customization.
thanks

jfernandez · August 28, 2019, 6:37pm

I’m also seeing a performance decrease on Xavier with Jetpack 4.2.2. Jetpack 4.1 could run Yolov3 at 45 fps, but now Jetpack 4.2.2 can’t get past 25 fps. I’m using the exact same software for starters and then recompile some libs but had no impact on improvement. I also maxed out the clocks after boot. The min freq of the GPU was set to 1377000000 same as in 4.1, but no change.

My only guess at this point is that 4.2.2 has changes that are affecting performance, and therefore USELESS.

JerryChang · August 29, 2019, 2:26am

hello jfernandez,

there’re some similar discussion thread for Yolo performance issue. please also check Topic 1060789, and Topic 1061155 for reference.

you might also gather more details, could you please refer to Topic 1058668 for configuration modifications.

however,
please have a try to manually reduce the network resolution in first few lines of yolov3.cfg, you might see the performance improvements.
for example,

width=416
height=416

jfernandez · August 29, 2019, 6:50pm

@JerryChang.Thank you!

It is now running faster than before at 50 fps. It seems it had to do with an API change.

Thanks again.