One thing to consider is that Linux itself was never meant to be “hard” real time. If you cut out enough of the general/multi-purpose desktop environment you’d stand a chance of getting close to real time, but not exactly.
One limitation I see is the inability to spread hardware interrupts over multiple cores. This means that only CPU0 can handle drivers from hardware devices…all hardware devices (every single driver allowed in must be optimized and minimal if they are to cooperate on a single core). If one driver hogs the time slice, then the rest suffer. Other software can be spread over several CPU cores, but if you have something urgent from a hardware device, CPU0 just won’t be immediately giving up access to something else.
The fast IRQ (FIQ) is designed to aid real time, but this is not generally available in Linux by design (the scheduler does not know about FIQ). The idea behind FIQ is that it is designed to be serviced immediately, and that the code space directly above the interrupt vector begins without need for a branch instruction. Like other interrupts, this is on CPU0, but has hardware-enforced priority. If you add the ability for some drivers/hardware to be able to generate/service FIQ then you could definitely get some improved latency from this…unfortunately, you’re probably also going to interfere with some of the other interrupt handling which was not designed to deal with that kind of forcible yanking away of control from a regular IRQ to FIQ.
On an Intel multi-core CPU normally only CPU0 could handle hardware interrupts, just like the ARM multi-core CPU. However, there is some hardware assist that combined with an asynchronous programmable interrupt controller (IO-APIC) hardware IRQs can be spread across cores…this is an extreme boost in reducing latency with systems being able to remain smooth even with extreme hardware IRQ rate/load (IRQ starvation occurs much earlier under load with no IO-APIC versus with IO-APIC). It takes a slight bit more time to use the IO-APIC versus running without IO-APIC when not enough IRQs are occurring to need multiple cores, but as interrupt request rate goes up (meaining hardware drivers needing a time slice), the IO-APIC version just doesn’t slow down until it is at a much much higher load than compared to the CPU without the IO-APIC.
ARM Cortex-A series does not have the hardware to easily implement an IO-APIC, but I sometimes wonder if some approximation could be achieved with some GPIO trickery and a custom equivalent to the IO-APIC code sitting just above the FIQ interrupt as a sort of surrogate hardware IRQ handler. I’ve not figured that one out.
If real time were even “moderately difficult” under Linux and not easy I suspect the needs of so many people would mean that we’d already have the hard real time Linux distributions. Ubuntu itself is not meant for that, though you could get vast improvements by judicious cutting of lots of things, including graphical desktops.
Also note that ARM does make the Cortex-R series, which is the real time hardware. Tegra chips use the Cortex-A series which is designed for high performance in generalized situations…on average, not as a real time performance.