Hi mpfeffer
As Shane says, in order to avoid an additional memory copy USERPTR seems to be what you need. And yes, it is supported by TX1. However plain malloc won’t work for that purpose. Many drivers have alignment and/or contiguousness requirements. Even though V4L2 states in their documentation:
If required by the hardware the driver swaps memory pages within physical memory to create a continuous area of memory. This happens transparently to the application in the virtual memory subsystem of the kernel.
I’ve seen that this is not the case for many drivers, specially on embedded systems. They simply fail when queuing the buffers. You may use alternative allocators such as http://man7.org/linux/man-pages/man3/posix_memalign.3.html.
Take a look at https://github.com/fastr/yavta. It is a simple C application that can be built and runs out-of-the box on the TX1. You can use it to test USERPTR and use their code as a reference.
We have documented our learnings in the following wiki page. The info there, although platform generic, was originally written playing with a TX1.