I started to work with Jetson TX1 and I found some problems with CPU performance. I investigated it in more details. I wrote the small program for testing CPU performance(is just matrix multiplication). I will attach this code.
- I added OpenMP pragma to this code for using all cores. I compared performance TX1 with TK1. So we got following results.
matrix dimensions is 1000
number of used cores, execution time sec
1 - 8 sec
2 - 6.2 sec
4 - 4.6 sec
1 - 16
2 - 8.2
4 - 4.8
How we can see using 4 cores on TK1 give us a more acceleration than on TX1, 3.33 and 1.74 respectively for TK1 and TX1. It’s very strange because matrix multiplication is good task for parallelization. But I tried to increase size of task.
matrix dimensions is 1500
using core, execution time sec
1 - 100
2 - 51
4 - 28
Here we got good acceleration. But I don’t understand why. May be dimensions is 1000 is very small task for TX1. May be do you have some ideas about it?
- After this I did the other test. There is taskset program for CPU affinity for program. So I launched one by one matrix multiplication on each cores. Each instance uses only one core and on one CPU core works only one instance of matrix multiplication.
number of launched instances, execution time sec by one instance
1 - 8
2 - 9.8
3 - 13.3
4 - 26
1 - 16
2 - 16.7
3 - 17.5
4 - 18.9
These are very strange results. Performance on one CPU core is decreased a more than 3 times! May be anyone can try to reproduce these results on own TX1? Or anyone can give some recommendation for avoiding these problems. This test reproduces the real case of work big system on Jetson TX1. And now there are some problems with that.
Also I use scripts for maximizing performance from here. http://elinux.org/Jetson/TX1_Controlling_Performance
On Jetson TX1 I use JetPack 2.3.
I have got the very strange results! Do anyone have any idea about these problems?