Hello,
We noticed that there seems to be a significant performance difference across different Jetson TX2’s. In other words, our application runs great on a few Jetson TX2’s is not operating as fast as it should on another set. In the process of isolating the cause, it appears to be a physical difference between the Jetsons. We’ve observed a ~15% performance difference under same software/hardware conditions (we flashed the same setup to 2 different Jetsons back-to-back and performance variance still persists). We initially noticed the difference in our product, but the behavior is exactly the same on the dev kit board with our SW removed. The difference isn’t just noticeable in our app, but the CUDA samples also illustrate the difference in performance.
We’ve measured this by running a sample benchmark example which is part of cuda, under /usr/local/cuda-8.0/samples/1_Utilities/bandwidthTest.
Command used is ./bandwidthTest --mode=shmoo --csv
Results are attached but below is a relevant snippet.
bandwidthTest-H2D-Pinned, Bandwidth = 20145.7 MB/s, Time = 0.00318 s, Size = 67186688 bytes, NumDevsUsed = 1
VS
bandwidthTest-H2D-Pinned, Bandwidth = 17135.0 MB/s, Time = 0.00374 s, Size = 67186688 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 36264.4 MB/s, Time = 0.00177 s, Size = 67186688 bytes, NumDevsUsed = 1
VS
bandwidthTest-D2D, Bandwidth = 31506.6 MB/s, Time = 0.00203 s, Size = 67186688 bytes, NumDevsUsed = 1
There appears to be a ~17% difference in the H2D Pinned test and ~15% difference in the D2D test.
We are operating in nvpmodel 0.
Is this variance expected or is there something wrong with some of the Jetson Modules we are using? We have a good group & a bad group, hard to say exactly how many units are affected as of now. What type of bandwidth is expected with this test under the default L4T conditions with no other high-load apps running? Please help us understand our observations. Thank you