CSI-2 scatter gather DMA

mpfeffer · October 24, 2016, 7:25am

Hi. For my company I want to modify/develop the driver for a CSI-2 camera on the Jetson TX1 board (e.g. the one that ships with the Jetson TX1). In our application our customers want to allocate memory in user space (e.g. using malloc or new operator of C++). The allocated memory block may of course be scattered (consisting of multiple physical memory blocks mapped to a contiguous block of memory in the applications virtual address space). We now want to acquire a frame from the CSI-2 camera directly (important: without any additional memory copy!!) into the memory that the customer allocated in user memory. The reason for that is to save memory bandwidth (any memory copy would cause memory bandwidth) and CPU usage in order to achieve maximum performance (worst case in the end: attaching multiple CSI-2 cameras and all of them streaming frames with maximum CSI-2 bandwidth). For that the memory would have to be mapped to kernel space and a scatter gather DMA would have to be used in order to tell the hardware which physical blocks of memory to write the data to.
Is such an operation possible with the Jetson TX1 (or any other ideas?)?

Thanx,
Marcus

ShaneCCC · November 3, 2016, 3:01am

Hi
I believe you can use V4L2_MEMORY_USERPTR for your purpose.

mpfeffer · November 7, 2016, 11:16am

Does that mean the Jetson TX1 actually supports DMA transfer for CSI to scattered user mode (virtual) memory? Where can I find any documentation on this?

michael_gruner · November 7, 2016, 5:09pm

Hi mpfeffer

As Shane says, in order to avoid an additional memory copy USERPTR seems to be what you need. And yes, it is supported by TX1. However plain malloc won’t work for that purpose. Many drivers have alignment and/or contiguousness requirements. Even though V4L2 states in their documentation:

If required by the hardware the driver swaps memory pages within physical memory to create a continuous area of memory. This happens transparently to the application in the virtual memory subsystem of the kernel.

I’ve seen that this is not the case for many drivers, specially on embedded systems. They simply fail when queuing the buffers. You may use alternative allocators such as http://man7.org/linux/man-pages/man3/posix_memalign.3.html.

Take a look at https://github.com/fastr/yavta. It is a simple C application that can be built and runs out-of-the box on the TX1. You can use it to test USERPTR and use their code as a reference.

We have documented our learnings in the following wiki page. The info there, although platform generic, was originally written playing with a TX1.

mpfeffer · November 8, 2016, 8:40am

Thank you for your hints. I understand the yavta example and how to use V4L2 to grab images. However my task is to write a V4L2 driver for a new camera. For that I need documentation on how scatter gather DMA can be used inside the driver. E.g. how can I implement V4L2_MEMORY_USERPTR myself if I write a driver? However I will even have to do a bit more fancy stuff than just implementing V4L2_MEMORY_USERPTR but broken down this issue is mostly based on the proper usage of scatter gather DMA.

Thank you,
Marcus

ShaneCCC · November 8, 2016, 9:04am

You don’t need to implement V4L2_MEMORY_USERPTR if you want to write a V4L2 sensor driver. The V4L2_MEMORY_USERPTR was implement by NV vi/csi driver.