PCIe Driver zero-copy DMA


I am working on a PCIe driver for a Jetson TX2 (currently using a TX2 developer carrier board). We would like to export raw LiDAR sensor data from our hardware over a PCIe bus into the TX2’s GPU memory for post processing. The LiDAR sensor has an Arria 10 FPGA where we will embed the PCIe IP. We need around 16 Gbps throughput.

Are there reference PCIe drivers which have a zero-copy DMA implementation for the TX2? Any kind of a starting point for my driver development would be appreciated.




I’m afraid 16 Gbps bandwidth may not be possible with TX2 as TX2’s PCIe HW is capable of max Gen-2 and x4 and after considering the protocol overhead etc, we can get 13 Gbps of usable bandwidth.
Regarding the zero-copy DMA implementation for the data to be used by the iGPU (internal GPU of TX2), please refer to the following documentation
and https://github.com/NVIDIA/jetson-rdma-picoevb could be a good starting point with example code etc…

I don’t know if Xavier has that available, but Xavier is by far closer to what you want than is the TX2.

Thank you for the replies and information. Yes, it does look like the PCIe interface on the TX2 will not provide the necessary throughput.

We do not need bi-directional data transport so the MIPI CSI-2 interface on the TX2 would work or we can develop a PCIe interface for the AGX.

We could use a CSI-2 interface on the TX2 or the AGX depending on our computing needs.

Using the MIPI interface will complicate the user land application. We will not be able to send the data over a single 12 lane MIPI interface. Using MIPI will require us to separate the information into 3 “camera” interfaces with 4 lanes each.

Does the TX2 support the “Generic Long Packet Data Type” or the “User Defined Byte-based Data” from the CIS-2 specification? We will not be using a standard video data type.

FWIW, AGX’s PCIe is Gen-4 and x8 capable and can offer around 107 Gbps BW usable bandwidth (This of course needs an endpoint device that can fire its DMA to transfer huge amounts of data in one go i.e. minimizing the SW intervention as much as possible).

Current NVCSI/VI driver may not support “User Defined Byte-based Data”

Thanks for the replies. I think I am going to go with the PCIe interface. It seems less risky to implement. Our FPGA carrier has a Gen3 x8 lane interface which should do nicely on throughput. I will probably use the Xillybus IP and not worry about zero-copy until I get a completed system and see if works.