Threads running on an Asymmetric System : Parker SOC has 2 Denver Cores and four ARM Cortex A57

I don’t know if the Denver cores do not have NEON or not, but the ARMv8-a spec makes NEON mandatory, so I doubt this is the issue. The optional extensions, such as NEON and floating point of older 32-bit ARMv7-a are not optional in 64-bit ARM. However, something you quite possibly are running into is cache performance. If you migrate across cores, then you will most likely get cache misses on that core the first time you hit the core. Operating on the same core tends to get cache hits.

If other processes are operating on a core, and those processes update cache, then it is possible that even if you operate on a single core there will still be a cache miss after the other process replaces the cache.

It might get complicated, but you could consider assigning core affinity, along with denying a core to other processes (never do this with CPU0, the first core…it handles hardware interrupts).

About cgroups and core affinity:

Note that if you have gone far enough to set up affinity you can also set up priority.