unpredicted(random?) CPU allocation on TX2

heyworld · November 12, 2018, 7:41pm

My experience:
Even I run the exactly same program, the profiling time varies a lot.
Let’s say I run 10 times, 6 times is 80 ms, 4 times is 130 ms.
About my program, there are main thread and 1 extra thread created by myself.
My analysis:
I have already set TX2 to maximum performance, which means all 6 cores should be active.
Even explicitly I only two threads in program, but when I open the system monitor, I can see 6 cores are running actively.

Tx2 ha 6 GPU cores(4 A57 cores and 2 denver cores), mostly 1 denver core runs to around 90%-100%, all other cores runs about 30%. But each time, it’s a little different, sometimes denver core 1 runs to high usage, sometimes denver core 2. I can’t really find the patterns between 6 cores and profiling time.

Is there any method I can do like manipulating CPUs allocation to make sure the profiling time is always 80 ms, using the fastest mode in condition that I have already set it to maximum performance(sudo nvpmodel -m 0)

VickNV · November 13, 2018, 8:19am

Hi heyworld,

not sure if pthread_setaffinity_np() helps in your case.

http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html

heyworld · November 13, 2018, 8:54am

Hi Vickyy,

I am using sched_setaffinity, I tried set affinity with setting two denvers on, seems like it does not help.

Actually, I need to know the CPU pattern when the profiling time is 80 ms, but this pattern seems not very clear to me. That is why I don’t know how can I allocate CPUs to threads(like which CPU core for which thread).

linuxdev · November 14, 2018, 7:19pm

You can set affinity via process ID for a user process, or via IRQ for a driver. On the other hand, if that core is not capable handling the request the scheduler will move to another core which can handle it. Add to this that on a non hard realtime system affinity is a strong hint rather than being an unbreakable order. If other threads of higher priority need that core, then it is possible the process will still get bumped to another core (and the more your process hogs that core the more the pressure builds to give other processes access as well). In the case of varying latencies, also know that having your process on that core does not stop other processes from using that core…everything else would need to be told to mask out that core in affinity if you want just your process on that core.

Are you simply trying to get a consistent low latency? If so, then this cannot be guaranteed on a non hard real time system. You will also get variations due to different data, e.g., cache hits or misses. Very likely if there is disk access or other data access via I/O going through core 0, then there will be a delay while waiting for data. You might be setting affinity correctly and still get varying results.

You could perhaps increase the priority of your process (a more negative nice number), but that only goes so far. If you are setting affinity to a core other than core 0, then this might work well (a lot of I/O goes through core 0…it would be a very bad idea to starve other threads/drivers of core 0).