DirectStylus (or GPU accelerated stylus) on Jetson TK1?

Perhaps a stupid question, but can the Jetson TK1 platform support
GPU-accelerated stylus and rendering? The stylus would be connected
via SPI on the Jetson board, but it’s not clear that an extremely
low-latency path is available for applications.


I can’t give an answer, but a question anyone answering will ask is which drivers will be used? It seems like the answer might also depend on the graphics environment, such as OpenGL ES, so probably it would help to know which stylus, and how you’ll be using it. From what I’ve seen, the answer often is simply whether a hardware accelerated cursor is used, and Jetson is certainly set up for that.

The device is running HID over SPI; we have been using the standard Android
HID Multitouch driver up to this point. Latency is a rather serious issue;
100msec (complete stack latency from touchdown to screen render) may be
tolerable for small devices, but on larger-screen devices it’s pretty bad.

Here’s a set of articles (there are 6 articles all together) about SPI on the Jetson that you may find helpful:–Part-1-Why-is-SPI-important.aspx

I’m not sure what the differences would be between an android driver and an L4T driver, but it is interesting to think about. Understand that mostly the latency issue would be about hardware IRQ work load and how hardware IRQs sort/prioritize/schedule for calling. From what I’ve found out in the past, pretty much all SoC (including all ARM multi-core SoC at this time) which would work in a tablet or smart phone service the hardware IRQ on only a single CPU core, whereas desktop multicore will slow down less under heavy IRQ with the ability to service hardware IRQ on any core. Important because IRQ driven hardware latency beyond any given driver inefficiency mostly depends on that one core being available (all IRQ triggered hardware handling competes for this specific core).

If you have latency issues profiling with kernel drivers can be quite difficult (it isn’t necessarily always easy in user space, and components for the driver run in both user and kernel spaces at the same time). The thing to keep in mind on any system where only a single CPU core can handle an interrupt is that everything else with a hardware IRQ is competing. The latency might be a result of another driver which locks the first core for a long time, and ends up causing other hardware IRQ service to wait rather than being serviced immediately.

The HID driver isn’t something GPU acceleration can help with, I/O from an HID device like a mouse or stylus is very low/trivial bandwidth and the data too simple to care. Rendering of what was tracked can cause latency, for which something like hardware accel OpenGL ES helps in visualizing…in which case two hardware IRQs are involved…one to service the sylus, another to service graphics display of the stylus data. Graphics is already GPU accelerated, the HID stylus is so simple it doesn’t need GPU.

It is conceivable that graphics could cause a 100 ms latency, but the frame rate on Jetson under OpenGL ES is normally so much faster than this that it seems unlikely. So long as the I/O (such as from USB) hasn’t gone to sleep or set to a low power mode (adding wake up time), the HID layer (regardless of being USB or SPI) is just too simple to cause a significant workload.

I don’t know how to do it, but the smoking gun on the latency issue is more a question of which other hardware IRQ/hardware device is on average either holding on to the first core for the longest time, or causing the stylus to wait longer for IRQ handling. If you could disable non-critical hardware IRQ servicing (e.g., build support as a module and remove the module), then you might see a reduction in latency as an IRQ hog is removed. Maybe it is even simpler, perhaps some of the hardware can be disabled via sysctl.conf.

One thing you’d want to do under any circumstance is to make sure your testing is with Jetson in high performance mode, with no battery saving or sleep getting in the way. See this:

If you have a user space process running, you might also test latency with and without renicing the priority of that process.