I have 2 TX2 boards, one is L4T R28.2.1, another is L4T R32.1. I tried to write and test a simple code on both, it ran twice faster on my L4T R28.2.1 than on L4T R32.1.
#include <iostream>
#include <chrono>
#include <functional>
double duration() {
return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
}
void benchmark(const std::string &label, std::function<void()> f_action) {
double start = duration();
f_action();
std::cout << label << " " << duration() - start << "ms" << std::endl;
}
int main() {
int i = 0;
int b = 0;
benchmark("test cpu run", [&]() {
for(; i < 100000000; i++) {
b+=i;
int c = 15;
c = b + c + i;
}
});
return 0;
}
On R28.2.1: 315ms
On R32.1: 729ms
Both were adjusted by jetson_clocks and using nvpmodel 0. The latest one is supposed to be at least as fast as previous one. I googled but found no similar topics.
For me the results are:
$ g++ main.cpp
test cpu run 425ms
test cpu run 375ms
test cpu run 412ms
test cpu run 370ms
test cpu run 357ms
test cpu run 412ms
$ g++ -O3 main.cpp
test cpu run 158ms
test cpu run 147ms
test cpu run 144ms
test cpu run 131ms
test cpu run 154ms
That is Jetpack 4.2, so I get about the same speed as you on 3.3 - slightly slower but there could be other factors. Check tegrastats on 4.2 and see what frequencies you get.
Thanks Dalus for your test. I found a background process was running at testing time, so that could be my problem.
But I still see performance issue. After restarting both TX2s, I tried another test using performance_gpu sample of opencv on both my TX2 (same opencv3.4.0 and were built with same configurations, both are adjusted by jetson_clocks with nvpmodel 0). I got most of tests on JP42 are slower than on JP33, some are even extremely slower.
this also includes tegrastats logs to make sure both are at maximum performance mode.
I’m considering to downgrade my JP42 TX2 to JP33 to get consistent of my app in development stage, but still hoping to get some ideas to solve this problem for further investigation.
Yes it’s same now with first simple test (the problem was indicated as I was having a background app at testing time). I added another test with results which I obtained on both TX2, that presents different performance on each JP versions. Do you think the problem is from opencv that is not optimized for JP42?
I tried again with another sample from GitHub - dhernandez0/sgm: Semi-Global Matching on the GPU, got same results on both TX2. Probably opencv3.4.0 is not optimized for latest Jetpack, I will downgrade my TX2 to continue my work.
I also experienced performance degrade in Jetpack 4.2 and hence found this post. When I ran YoloV2 on Jetpack 3.3 and I used get 8-10 FPS and now on Jetpack 4.2 I get hardly 3 FPS. Similarly by running YoloV3 I used to get around 3 FPS in Jetpack 3.3 and on 4.2 the memory gets full and the process dies giving a segmentation fault. I have many more cases where I found this performance issue. I use OpenCV in all my cases. I have tried both OpenCV 3.4.0 as well as OpenCV 4.0.0 and there is no major difference in the performance. It will be very helpful if someone diagnoses this issue.
there’re some similar discussion thread for Yolo performance issue.
please also check Topic 1060789, and Topic 1061155 for reference.
however, please have a try to manually reduce the network resolution in first few lines of yolov3.cfg, you might see the performance improvements.
for example,
width=416
height=416
since JetPack-4.3 is now public released, could you please upgrade to the latest JetPack release for confirmation.
you should also initial another discussion thread for further supports,
thanks