How to re-purpose peripheral processors on TX2

For my project I would like to have control over numerous GPIO lines, without relying on processing power from the primary processors (A57/Denvers). I notice there is both an AlwaysOn (R5), as well as an Audio Engine (A9) on the SOM. Is is possible to re-purpose one of these processors to handle GPIO logic in close to real time? What methods are available for programming those peripheral processors?

Thanks!

Hi bryant
Attached a sample to reference.

This gpio test app will demonstrate the gpio in/out as well as the
gpio-irq functionality.
To use this gpio demo code; connect a jumper between the Pins 15 and 17 of
the J26-GPIO_EXPANDER_2 and program the SCR for the Z_0 and Z_1 pins in
the mobile_scr.cfg as mentioned in the documentation.
Use the configENABLE_GPIO_APP to enable this demo gpio-app.
6d07209.diff.zip (3.56 KB)

I looked at the .diff you supplied me and I am a bit confused as to what source that diff is based on. I am using the kernel-4.4 from Jetpack 3.1 (Jetson TX2), and it looks like the files you are changing do not exist for me, or are vastly different. Is there anyway you could give me a bit more information on what I need to do to apply this patch?

Thanks!
Bryant

@bryant
This sample is the RTOS APP not sure if it’s helpful for you.
Could you give more information for you use case. You want to control the GPIO pin in what state?

I am using a Jetson TX2 development board, along with the Jetpack 3.1 package. I followed the instructions found at:

I am able to toggle/read GPIO pins using the ‘sysfs’ method, but I would like to use the Audio Processing Engine (Arm A9) peripheral processor as a general purpose IO handler. I have hard real-time requirements for GPIO and want to be able to load my own firmware onto that A9 chip. It appears that it currently is held in reset, and is not being utilized. If you have any information regarding how to program the APE/ADSP with a custom firmware package, or possibly any example code of something like this, it would be greatly appreciated!

Thanks!
Bryant

When you say “hard real time,” what do you actually mean? Seconds? Milliseconds? Microseconds? How sensitive are you to delay, compared to jitter?

If you spin up a real time thread on one of the main cores, and don’t block it on things other than your needs for GPIO, you should be able to achieve very tight timing.

If you write a custom driver that talks to the GPIO hardware, from within such a real-time thread, you should be able to get better than microsecond timing using that method.

The issue to consider is whether GPIO is able to reach any CPU core other than CPU0. Interrupts go through an aggregator, and then the aggregator feeds the core. So far as I know (and this needs to be confirmed as correct or incorrect) the aggregator is only able to feed CPU0.

If you watch “/proc/interrupts” I think you’ll find anything which involves a hardware IRQ shows only on CPU0. Timers show on other cores, but there is basically nothing else on any core except CPU0. Software interrupts have no limitation on how interrupts are physically wired. Assuming you can do everything you want on the APE I’m not so sure that a GPIO can reach the APE with realtime behavior since CPU0 is the interface.

For true hard real-time, you don’t need interrupts; you write your loop to scan inputs and generate outputs and loop; turning off interrupts because they will just add jitter.
But NVIDIA has previously indicated that the “other cores” are “not usable by developers,” for similar questions, so I think the best option is to figure out how to lock a real-time Linux thread to one core, and have that do the work you need to do, if “normal Linux” can’t serve whatever the timing requirements are.

The trouble is that the APE core probably has no access to the state of the GPIO without CPU0’s help…there is no scanning of inputs without the intervention of CPU0, so it may not even matter if APE can be repurposed if GPIO has a non-RT connection to the core.

NVIDIA may be able to comment more on repurposing the audio core…there has been interest in this in the past, but I’ve not heard if this had any progress. I personally have some interest in RT, but for my purposes I’ve had doubts that the APE could do what I want due to CPU0 being the hardware interface (certainly it could help my case, but it wouldn’t be the same solution as an RT core with direct handling of hardware/GPIO inputs).

So you’re saying I would have to dedicate an entire A57 core to achieve true real time GPIO? It seems like I would be wasting a lot of potential compute power. That is why I would like to use one of the peripheral processors.

When you say ‘custom driver’ are you referring to using the methods defined in linux/gpio.h (e.g - gpio_get_value()), or do you mean an even lower level that that?

The functions in linux/gpio.h are kernel space – you can’t call them from user space – so you’d have to write a driver / kernel module to actually use them. The only other Linux API to GPIOs that I know about are sysvs, which is slow (a read()/write() call and kernel transition each change), and various custom memory-mapped options (which, again end up requiring a kernel-level driver.)

Whether you need to dedicate a full core to achieve “true” real-time I/O depends entirely on your timing requirements, which you haven’t yet actually told us what they are.
Whether that compute power is “wasted” or not depends on whether you have anything else to do with it, and what your specific timing requirements are.

You can also stick a microcontroller on a bus (SPI, I2C, CAN, UART, USB) and have that take care of whatever bit-twiddling and reaction you need, and let the general-purpose computer with GPU do computer-with-GPU things (which generally run on a “per frame” basis, not a “per microsecond” basis.) Or if you need nanosecond-scale responses, an FPGA or CPLD of some sort.

The engineering solution space is big, and it’s impossible to know what the “right” answer is without having a very good idea about what the specific requirements are.

So: When you say “true real time,” what kind of jitter in response times can you live with? Is 1 microsecond jitter ojay? What about 10 microseconds? 100 microseconds? 1 millisecond? What kind of latencies are acceptable?

I’d like it so that when I toggle an input, the kernel is able to set an output in response in less than 10 micro-seconds. I have it doing just that now, using a polling loop, setting cpu affinity (isolcpus=5), and isolating one of the A57 CPUs as much as I can. However, the tegra186_timer5 interrupt still fires every jiffy (4ms), and takes my response time from ~2us to over 10us (using digital logic analyzer to see).

I am running a test to toggle the gpio as fast as I can inside a LKM with:

taskset -c 5 sudo insmod gpio_test.ko

Is there anything else I can do to make one CPU just do a single task, and nothing else?

As long as it is just that core (and not CPU0) you might test maxing out the priority of the process, e.g., renice to “-20”…you have to be careful with that though since if you put that high of a priority in the wrong place and it gets to the wrong core things won’t go well. “man renice” and “man nice”, or just run something like “htop” and interactively change priority of the process…be absolutely certain there won’t be any new threads/processes spawning off which might go to a new core with that priority.

renice still keeps the process in the regular queue; on Linux, you can set the process to real-time priority to make it even higher priority.

However, if the hardware is such that only core 0 actually has access to the GPIO hardware, then you’re probably screwed.
If the hardware is such that the A57 cores all have access to GPIO, it should work, assuming you set the affinity to a A57 core.

Another option is to add a $3 microcontroller to deal with the bit toggling, and let the Jetson deal with the image processing and networking and such …

Yes, there is the risk…if CPU0 is required then part of the process can run on another core, but CPU0 is mandatory for some hardware. Latency due to an external I/O is much harder to solve than latency of pure software.

Thank you both for all your input! It has been really helpful. I am going to investigate the thread priority stuff and see if that is a possible route.