What speed is the UART running at? As long as it is no more than 115200, then it would be likely that something unusual would have to occur before the Jetson would fall behind. The clock for the UART in the Jetson is slightly off though, compared to what it should be at, if we are talking about speeds above 115200. There are other unrelated reasons than just the UART why speeds above that might be a bit spotty. If you use the CTS/RTS flow control, then you can tend to have success at higher speeds because the Jetson can halt the flow and process data and then restart it when under load.
The UART itself I think can only run on CPU0, but I could be wrong about that. The UART is available during boot prior to Linux ever loading, and only CPU0 runs during boot. There is a lot of hardware on the Jetson which cannot migrate to another core, and despite not being certain of this, I think the UART is one such device which cannot migrate. If you were to tell it to run on CPU1 (the first of two Denver cores), then under “/proc/interrupts
” I think every invocation of that UART would increment the reschedule counter as the scheduler puts this on CPU0. If that is the case, then all it would do to put this on CPU1 is to increase latency.
Note that all software interrupts can migrate to any core. Those IRQs do not involve hardware for their own code (a software driver can of course call a hardware driver, e.g., a software process might indirectly need disk reads). Many drivers which work with hardware have multiple functions, but not all of the function requires hardware access. As a contrived example, imagine you have an Ethernet driver, and that there is a checksum going on. In that case it would be considered good design to separate the work into a driver for the hardware IRQ which is atomic and locks in the CPU core only for the least amount of required time to transfer network data; following that, then the exit of the hardware IRQ could invoke a software IRQ to run the checksum.
In that example case most often you will find the hardware and software IRQ actually get scheduled on a single core. The scheduler is aware of the cache, along with the possibility of getting a cache hit or miss. On a true hard RT system you won’t have cache. On a Jetson the scheduler is aware of cache; I don’t know how the RT kernel changes things, but you can be certain that cache still gets in the way of hard realtime, and that much of the scheduling is simply so that you can guarantee higher priority processes run within a certain time at the cost of a lower priority process. This does not mean latency can be predicted on a Jetson, but there is a second reason I mention this: If the hardware must run on CPU0, but if there are then a chain of software interrupts, then there might be an advantage to scheduling the software IRQs to go to the Denver core. You would have to beware though that sometimes performance for this can end up not being very good because you are more or less guaranteed to get a lot of cache misses. When the hardware and software IRQs run on the same core, and the two occur in sequence, then most likely you will get a lot of cache hits.
If it turns out that you are seeing a lot of reschedules in “/proc/interrupts
”, and any of this is related to the Denver core, then you might consider that part of the work is being rescheduled to CPU0 when it is requiring that physical wiring to CPU0. If that is the case, and if you can either create a situation of two threads working together whereby the hardware access can be purposely bound to CPU0 (it would go there anyway; this prevents wasted time migrating), and the other part can be purposely bound to CPU1, then you probably do have an advantage. If the mechanism of working between the two cores does not cause too much cache thrashing, then your timing will be significantly improved at the cost of somewhat less “average” throughput.
Can you profile (even if it is just guessing) parts of your program which mandate hardware access (the UART) from the software processing? Then use interprocess communications, or threading with something that shares memory, to run the rest of this? This would reduce CPU0 load, and probably the biggest timing issue on Jetsons is IRQ starvation on CPU0 due to the load which must be on that specific core.
Incidentally, just as an experiment, watch the software IRQ handling load (ksoftirqd
):
watch -n 0.5 "ps aux | egrep '(ksoftirq|CPU)' | egrep -v '(grep|watch|ps)'"
Then run your application after observing soft IRQ load. Does the soft IRQ load change when you run your application under its heaviest load? You could do something similar watching reschedules and any hardware involved in your processing. See how that changes when going from your program not running to when your software does run. Then decide if there are parts of your software which could run in a separate process, and move that software part to CPU1 while leaving the rest on CPU0.