I encountered a strange issue. I unboxed my jetson tx2 and ran the .sh file (that comes shipped with tx2) to install ubuntu. Then I ran the script below it executed in 20 microseconds. The compilation command did not include any architecture specific info like: -march or -mtune. Then I flashed the tx2 with jetpack (JetPack-L4T-3.1-linux-x64) and then ran the scrip again, this time it takes 200 microseconds. No other programs are running, just whatever daemons in the background after flashing.
Why is this 10X slower? How do I get back to the original installation. I did not copy it my mistake. Please help.
Compilation command:
g++ main.cc -o main -std=c++11 -O3
Script
#include <iostream>
#include<vector>
#include<chrono>
int main()
{
std::vector<float> v1(100000,2.0f);
std::vector<float> v2(100000,1.5f);
std::vector<float> v3(100000,1.5f);
auto tick = std::chrono::high_resolution_clock::now();
for (uint32_t i = 0; i < 100000; ++i)
v3[i] = v1[i] *v2[i];
auto tock = std::chrono::high_resolution_clock::now();
std::cerr << std::chrono::duration_cast<std::chrono::microseconds>(tock-tick).count() <<"\n\n";
return 0;
}
@Honey_Patouceul:
Yes, I have ran jetson_clocks.sh script but not the nvpmodel -m0. I ran the program after issuing both these commands it now takes ~150 microseconds.
@AastaLLL:
Yes, I am getting the same results as well.
Starting to rethink if my initial results were somehow wrong(may I ran the for loop for 10K indices - probably had a typo in the code):
0.1 million ops in 20usec means in 1 sec: 0.1 x 10^6 x 10^6/20 = 10^9 x 5 ops per sec
0.1 million ops in 150usec means in 1 sec: 0.1 x 10^6 x 10^6/150 = 10^9 x 0.67 ops per sec
Do you guys think the former is even possible? I have not calculated the flops (x cycles/sec x flops/cycle)
I am getting around ~80 to ~110 microseconds. Thank you for that. This makes me wonder if it is at all possible to extract more performance out of the ARM core.
Is there something else I can do to boost the performance out of ARM cores.
Sorry for my typo. Max-N is model 0, not mode 2.(Already correct the information in comment#6)
Here is the nvpmodel information:
The maximum frequency of TX2 CPU is 2.0 GHz.(both A57 and Denver)
jetson_clocks.sh will lock CPU frequency to 2.0GHz(max) and give users the best performance.