I’ve written a performance critical piece of code that relies on ARM NEON instructions for performance. The target hardware is Magic Leap (ML1) which is actually a Parker SOC. I have to thread the application to maintain performance but I’m getting random performance depending on which worker thread in the Unity game engine gets the job. When assigned to thread 2, my code runs in 29-33ms. When assigned to thread 1, it runs in 57 ms.
My theory is that thread 2 happens to be on an ARM core with NEON and thread 1 is on a Denver core without NEON. What is expected to happen when code that is compiled with NEON runs on cores that do not have NEON? The code does run so there must be emulation going on. Is it supposed to be a feature of an OS that somehow puts NEON code on cores that support NEON or is that something a developer has to schedule manually?