Performance Variance Between Jetsons

Hello,

We noticed that there seems to be a significant performance difference across different Jetson TX2’s. In other words, our application runs great on a few Jetson TX2’s is not operating as fast as it should on another set. In the process of isolating the cause, it appears to be a physical difference between the Jetsons. We’ve observed a ~15% performance difference under same software/hardware conditions (we flashed the same setup to 2 different Jetsons back-to-back and performance variance still persists). We initially noticed the difference in our product, but the behavior is exactly the same on the dev kit board with our SW removed. The difference isn’t just noticeable in our app, but the CUDA samples also illustrate the difference in performance.

We’ve measured this by running a sample benchmark example which is part of cuda, under /usr/local/cuda-8.0/samples/1_Utilities/bandwidthTest.

Command used is ./bandwidthTest --mode=shmoo --csv

Results are attached but below is a relevant snippet.

bandwidthTest-H2D-Pinned, Bandwidth = 20145.7 MB/s, Time = 0.00318 s, Size = 67186688 bytes, NumDevsUsed = 1
VS
bandwidthTest-H2D-Pinned, Bandwidth = 17135.0 MB/s, Time = 0.00374 s, Size = 67186688 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 36264.4 MB/s, Time = 0.00177 s, Size = 67186688 bytes, NumDevsUsed = 1
VS
bandwidthTest-D2D, Bandwidth = 31506.6 MB/s, Time = 0.00203 s, Size = 67186688 bytes, NumDevsUsed = 1

There appears to be a ~17% difference in the H2D Pinned test and ~15% difference in the D2D test.

We are operating in nvpmodel 0.

Is this variance expected or is there something wrong with some of the Jetson Modules we are using? We have a good group & a bad group, hard to say exactly how many units are affected as of now. What type of bandwidth is expected with this test under the default L4T conditions with no other high-load apps running? Please help us understand our observations. Thank you

Good Log
Bad Log

Hi greg2,

We will try to reproduce this issue to do investigation, in the meantime, please help to provide the serial number of those good and bad modules ass reference.

Thanks

Hi,

Would you mind to fix the process clock to the maximal and give it a try?
Our default is dynamic and may introduce this variance.

sudo jetson_clocks

Thanks.

OK, I will give you the serial #'s tomorrow.

We do that as part of our boot-up before the test, as root we execute the following in a script:

nvpmodel -m 0
/home/nvidia/jetson_clocks.sh --restore <some_path>/clock_config.conf

Regarding the contents of clock_config.conf:

/sys/devices/system/cpu/cpu1/online:0
/sys/devices/system/cpu/cpu2/online:0
/sys/devices/system/cpu/cpu3/online:1
/sys/devices/system/cpu/cpu4/online:1
/sys/devices/system/cpu/cpu5/online:1
/sys/module/qos/parameters/enable:N
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:2035200
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:2035200
/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:2035200
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:2035200
/sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq:2035200
/sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:2035200
/sys/kernel/debug/tegra_cpufreq/M_CLUSTER/cc3/enable:0
/sys/kernel/debug/tegra_cpufreq/B_CLUSTER/cc3/enable:0
/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq:1300500000
/sys/devices/17000000.gp10b/railgate_enable:0
/sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked:1

We re-did the tests and took note of the serial numbers to maintain consistency (we didn’t know the serial #'s of the results above).
We captured the serial number with this, which matches what’s on the sticker:

cat /proc/cmdline

Good serial: 0424518014359 (Log)
Bad serial: 0424718090774 (Log)

Let me know if there’s anything else you need, we didn’t realize that this issue was having such a large impact until recently & don’t know how many units are affected.

Hi,

Would you mind to repeat the experiment 10 time and share the information with us?
We want to know the difference comes from variance or specific platform.

We also try to reproduce this internally. Will update more information with you later.
Thanks.

There seems to be very little variance between runs. The results across 10 runs are pretty consistent (ie. on one specific Jetson, the averages are very consistent).

Full output from all 10 runs is here:
Good Jetson
Bad Jetson

The delta between them seems to depend a bit on the size of the test payload, but when you average them all together for each test type what it basically boils down to though is this:
Good:
Average for all bandwidthTest-D2D is 19911.82MB/s
Average for all bandwidthTest-H2D-Pinned is 12130.17MB/s
Average for all bandwidthTest-D2H-Pinned is 12517.22MB/s

Bad:
Average for all bandwidthTest-D2D is 17687.67MB/s
Average for all bandwidthTest-H2D-Pinned is 10313.41MB/s
Average for all bandwidthTest-D2H-Pinned is 10843.36MB/s

10 Run Results:
Good:

Run 0 type: bandwidthTest-H2D-Pinned
12128.3024691MB/s
Run 0 type: bandwidthTest-D2H-Pinned
12523.3790123MB/s
Run 0 type: bandwidthTest-D2D
19977.6777778MB/s
Run 1 type: bandwidthTest-H2D-Pinned
12131.8666667MB/s
Run 1 type: bandwidthTest-D2H-Pinned
12570.1481481MB/s
Run 1 type: bandwidthTest-D2D
19959.6641975MB/s
Run 2 type: bandwidthTest-H2D-Pinned
12161.9925926MB/s
Run 2 type: bandwidthTest-D2H-Pinned
12523.5135802MB/s
Run 2 type: bandwidthTest-D2D
19805.4567901MB/s
Run 3 type: bandwidthTest-H2D-Pinned
12164.6753086MB/s
Run 3 type: bandwidthTest-D2H-Pinned
12526.9469136MB/s
Run 3 type: bandwidthTest-D2D
19967.4938272MB/s
Run 4 type: bandwidthTest-H2D-Pinned
12062.6148148MB/s
Run 4 type: bandwidthTest-D2H-Pinned
12522.0901235MB/s
Run 4 type: bandwidthTest-D2D
19911.2679012MB/s
Run 5 type: bandwidthTest-H2D-Pinned
12157.6283951MB/s
Run 5 type: bandwidthTest-D2H-Pinned
12528.9222222MB/s
Run 5 type: bandwidthTest-D2D
19919.9135802MB/s
Run 6 type: bandwidthTest-H2D-Pinned
12045.045679MB/s
Run 6 type: bandwidthTest-D2H-Pinned
12408.7197531MB/s
Run 6 type: bandwidthTest-D2D
19906.0975309MB/s
Run 7 type: bandwidthTest-H2D-Pinned
12149.0320988MB/s
Run 7 type: bandwidthTest-D2H-Pinned
12537.462963MB/s
Run 7 type: bandwidthTest-D2D
19956.8962963MB/s
Run 8 type: bandwidthTest-H2D-Pinned
12130.117284MB/s
Run 8 type: bandwidthTest-D2H-Pinned
12549.5518519MB/s
Run 8 type: bandwidthTest-D2D
19887.1444444MB/s
Run 9 type: bandwidthTest-H2D-Pinned
12170.4728395MB/s
Run 9 type: bandwidthTest-D2H-Pinned
12481.5074074MB/s
Run 9 type: bandwidthTest-D2D
19826.6407407MB/s
Average for bandwidthTest-D2D is 19911.8253086MB/s
Average for bandwidthTest-H2D-Pinned is 12130.1748148MB/s
Average for bandwidthTest-D2H-Pinned is 12517.2241975MB/s

Bad:

Run 0 type: bandwidthTest-H2D-Pinned
10306.4592593MB/s
Run 0 type: bandwidthTest-D2H-Pinned
10828.6049383MB/s
Run 0 type: bandwidthTest-D2D
17739.8876543MB/s
Run 1 type: bandwidthTest-H2D-Pinned
10312.2493827MB/s
Run 1 type: bandwidthTest-D2H-Pinned
10776.9MB/s
Run 1 type: bandwidthTest-D2D
17789.1567901MB/s
Run 2 type: bandwidthTest-H2D-Pinned
10377.9802469MB/s
Run 2 type: bandwidthTest-D2H-Pinned
10904.2333333MB/s
Run 2 type: bandwidthTest-D2D
17810.1271605MB/s
Run 3 type: bandwidthTest-H2D-Pinned
10263.754321MB/s
Run 3 type: bandwidthTest-D2H-Pinned
10780.3024691MB/s
Run 3 type: bandwidthTest-D2D
17669.8975309MB/s
Run 4 type: bandwidthTest-H2D-Pinned
10288.8283951MB/s
Run 4 type: bandwidthTest-D2H-Pinned
10867.6802469MB/s
Run 4 type: bandwidthTest-D2D
17632.4407407MB/s
Run 5 type: bandwidthTest-H2D-Pinned
10323.4654321MB/s
Run 5 type: bandwidthTest-D2H-Pinned
10837.2469136MB/s
Run 5 type: bandwidthTest-D2D
17677.954321MB/s
Run 6 type: bandwidthTest-H2D-Pinned
10311.4185185MB/s
Run 6 type: bandwidthTest-D2H-Pinned
10839.7641975MB/s
Run 6 type: bandwidthTest-D2D
17614.3222222MB/s
Run 7 type: bandwidthTest-H2D-Pinned
10283.4580247MB/s
Run 7 type: bandwidthTest-D2H-Pinned
10857.5802469MB/s
Run 7 type: bandwidthTest-D2D
17608.6728395MB/s
Run 8 type: bandwidthTest-H2D-Pinned
10346.7432099MB/s
Run 8 type: bandwidthTest-D2H-Pinned
10869.6012346MB/s
Run 8 type: bandwidthTest-D2D
17751.8185185MB/s
Run 9 type: bandwidthTest-H2D-Pinned
10319.8234568MB/s
Run 9 type: bandwidthTest-D2H-Pinned
10871.6938272MB/s
Run 9 type: bandwidthTest-D2D
17582.4246914MB/s
Average for bandwidthTest-D2D is 17687.6702469MB/s
Average for bandwidthTest-H2D-Pinned is 10313.4180247MB/s
Average for bandwidthTest-D2H-Pinned is 10843.3607407MB/s

As you can see, even if we don’t average the results, you can see at just a glance that they’re very consistent, and there’s a very consistent gap there in the data.

Thank you, look forward to your investigation. Let me know if you need anything else.

Hi,

We will check this with our internal team.
Will update more information with you later.

Hi,

We found this issue may be related to the board type.
Could you share your board version with us?

For example, our board is C03.
Thanks.

Hmm, I think I misunderstand board type. I was under the impression that all TX2’s were c03.

How do I get board version? I don’t see C0X or A00 anywhere on the label. Is there a way to access from SW?

Hi,

Would you help to provide the following information for both platforms?

  1. kernel log and the boot log
  2. serial number, which is on a label along with the 699 part number. (Please check attachment)
  3. ID-EEPROM contents
  4. tegrastats output when executing the sample

Thanks.
serial.png

Serials were provided above, but they’re here for reference:
Good serial: 0424518014359
Bad serial: 0424718090774

Here are directories containing the kernel boot log, EEPROM dump (which contains the above serials + 699 part #), test output, and tegrastats output.
Good Jetson Data
Bad Jetson Data

Interestingly enough looking at manufacturer stepping in the EEPROM dump the good one appears to be E0 and the bad one is B0. I hope this isn’t a HW bug fixed in later stepping.

Look forward to your reply.

Hi,

The serial number you shared is the board serial number.
But we need the one of GPU module.

Would you mind to recheck that?
The serial number should begin with 699xxxxx.

Thanks.

The EEPROM dumps which were provided in the links above contain the serial # starting with 699. You can access them and the full EEPROM dump there. Address 0x14.
For convenience, though, here they are:
Good: 699-83310-1000-B02 E.0
Bad: 699-83310-1000-D01 B.0

Also in my comments from post above:

and

Let me know if you need anything else. Thanks.

Hi,

Thanks for the log information.

We have passed it to our internal team.
Will update information with you once we got the feedback.

Thanks.

Hi,

E.0 and B.0 indicate different manufacturing revision.
But we are still checking the difference between these.

Thanks.

Yeah, that’s what I was saying worried me in the previous post. Hopefully there’s a SW workaround if that’s the root cause.

Let’s see what you guys find.

Thanks!

Hi greg2,

Could you also share the tegrastats result when running test?

Hi, it’s already in those same 2 folders as the rest of the data is in…

It’s called “tegra_stats_during_good_run” and “tegra_stats_during_bad_run”

So to recap the contents that were requested and that we shared there:

  • EEPROM data (including jetson case serial number AND 699 serial number)
  • tegrastats output
  • kernel logs
  • Test output

Thanks

Hi greg2,

Did you run tegrastats with sudo? I don’t see the temperature info.