I am experimenting with power and performance on the Jetson-TK1 using a simple video encoder as workload. When I start my encoder, it sleeps for five seconds using usleep() (http://linux.die.net/man/3/usleep), before the frame encoding starts. However, when the function returns, the frame encoding goes much slower than if I didn’t call usleep(). After a random amount of time, the time between successive frame encodings pick up again.
I attach a plot that illustrates the problem - it is very easy to see that the frame encoding is very fast if I don’t call usleep first.
The processor frequency governor is on "performance" and it is not changing, as it can be seen in the plot.
The high-performance cluster is always active while the encoder runs.
I have several threads that call usleep, where this is not a problem. It is only in the main thread that this is a problem.
The latency before the encoding speed picks up after a call to usleep() varies. Sometimes it is 2 seconds, and sometimes it is 20.
One thing I tend to think about when usleep hits is context changes to other threads/processes…and in turn the possibility that cache has been thrown out. Constantly losing and refilling cache with inbetween cache misses would be a serious slowdown. I’m not sure how you would test this though…perhaps if you can assign cpu affinity so one core is dedicated to your app.
I suspect this is more of a linux question and ARM is unrelated (unless you are writing kernel scheduler code in assembler). As a start see /usr/include/sched.h and pthread.h, search for “affinity”.
Affinity C functions are for your own code…there may also be ways to work with this in /proc or /sys without modifying your own program, but I have not worked with this.
In the past I have used “usleep(0)” to intentionally give the system a hint that this is a good place to allow other scheduled tasks to run. If anything does happen to take that cpu expect cache clearing and loading to perhaps end up as part of the latency unless your process/thread is the only thing on that cpu. usleep is not normally considered part of scheduling but the system will make use of that time and context switch. Consider that running your thread/process on a different cpu after the usleep pretty much guarantees added latency, but even on the same cpu the same issue will result if some other process changes cache during usleep…so affinity won’t protect cache, it will only offer an opportunity to keep cache.
I can’t guarantee cache is causing your issues (since there is all kind of power management in Jetson), but it is unlikely that this is not at least part of the problem. Your first “easy” step would simply be checking the forum threads here on stopping power throttling. Many people on this forum have found cpu frequency scaling for power saving to be a problem for them as well. Performance suffered, power saving features were disabled, performance went back up. If you turn off power management/cpu scaling though, your cpu use goes back up.
If your application were designed to actually turn power saving features off when returning from sleep, and then turn power saving features back on before starting sleep, then you would be able to improve both performance and power consumption. Add cpu affinity if you want to maximize everything.
I found out the reason behind this issue. The problem is that, sometimes, the EMC clock is stuck at the lowest frequency for some time (often up to a second) after usleep(). Seems like the EMC clock doesn’t react quickly to increased memory utilisation…
Yes, and indeed this is the only solution to the problem…
I would recommend against setting it to the lowest speed, however, as this sometimes crashes my board while using CUDA and I have to boot. (Lowest safe for me is around 40 MHz, don’t remember exactly the number)