Sorry, below is long, but I find it interesting…read at your own risk! This is more or less for my own entertainment, not sure if it would be practical commercially.
From number 5 of the above:
I would think that USB is problematic unless operating in isochronous mode. I am curious if this is the case? I think isochronous mode is the most difficult to work with on the Jetson side. Not so much due to USB (the Jetson would be in host mode, and it can work with isochronous), but because the Jetsons would probably have trouble consuming that much data with any sort of hard real time processing. I am guessing the biggest challenge is getting data into and out of the GPU without losing “frames” of data. The Audio Processing Engine (“APE”) uses a Cortex-R5 processor, which is good with hard real time, but memory transfer likely has no such ability to run with hard real time (there would be a need for a buffer and the buffer would harm latency).
From number 6 of the above:
In the case of an alarm which works via sampling what is essentially a set of resonant frequencies of a complex enclosed 3D shape one would not necessarily even care about something like road noise or people speaking. The idea there is to cause feedback at multiple frequencies “tuned” to the inside of the car’s “shape”, and to watch this with something like a Fourier transform. If the selected resonant frequencies change, then the shape has changed, and it is time for an alarm.
Unlike many alarms you could probably set up the software for arrays to be immune to things like other cars driving by (resonant frequencies would not change), or mice moving (a window of how much a resonant frequency is required for trigger), or a fan turning on (read a great story once about a mystery alarm that turned out to be going off whenever the fax machine ran in the middle of the night). Such a system would also be expandable to large buildings which is why this much complexity is of interest. Listening to multiple microphones would allow more complex resonant shapes to be monitored.
The part which is not obvious is that it isn’t just the total spectrum of key resonant components which matters, but also the timing of when a given spectrum is measured at different microphones. To illustrate, consider wearing stereo headphones. This works great for stereo positional audio when wearing headphones to listen. However, there are manufacturers of “surround sound” headphones. One would wonder how this would be possible and not just a gimmick when we have only two ears. Positional audio in headphones sort of works because certain tones which we want to “hear” with more directional information are given controlled delays…either to both left and right headphone, or to left versus right. This produces an illusion to the human listener of more directional information (microphone arrays produce an actual time delay if the same sound reaches a second microphone later than it reaches some other microphone).
I think if you were to try to check for multiple resonant frequencies in a complex enclosed space by audio, then timing of receiving a signal from different microphones at different locations would make for a much more reliable system (relative delays between microphones can also have a Fourier transform to treat time delays of similar audio into a set of discrete cosines…for an alarm it isn’t just about what the array receives, it is also about shifting of which microphone receives when). This sort of data works great in a GPU.
From number 8 above:
Just speculating, but I suspect that not only is the timing of various chambers of the heart being measured, but also the “shape” of how the muscle contracts like a wave over the muscle is being examined. To illustrate, if someone has had a heart attack in the past, then some part of the heart (muscle) has died and no longer contracts, but this is only part of any chamber in the heart. Knowing the timing of contractions of the heart is important, but knowing if the contraction is a smooth wave traveling over the heart, versus hearing some detail of a non-smooth contraction might give hints of muscle damage. An array might be able to determine that sort of defect, whereas a single microphone would not have that ability.