PCIe vs. CSI-2 Image Capture

Hi all,

I’m working on an image capture application with some very high pixel count (20MP+), high frame rate imagers. Some of these are natively MIPI CSI-2, others are Sony Sub-LVDS. The raw bandwidth is somewhere near 6Gbps, requiring a CSI-2 connection that can operate at D-PHY v1.1 speeds, or x2 to x4 Gen2 PCIe lanes (After overhead).

I plan to insert a FPGA in this application to support Sub-LVDS bridging, and also to support switching / muxing of incoming image streams (one imager to feed multiple Jetsons). As I’m sure most of you are aware, FPGA-based MIPI solutions are in general speed limited to 1.5Gbps or less, unless utilizing a transceiver based solution which requires additional components that can affect clock/data skew beyond acceptable margins.

PCI Express, being a common frequency reference based solution does not have the same skew problems and has been natively supported in FPGAs for almost a decade. I can easily run Gen2 5 GT/s lanes to my TX2 and support my imager bandwidth. Additionally, as Xavier has shown, I get “free” upgrades architecturally in speed by simply replacing the FPGA and Tegra with Gen3 or Gen4 PCIe cable devices. I can also run PCIe across copper cabling, fiber optic and in general longer path lengths than D-PHY.

What are some considerations I should take into account if I wish to DMA this data into memory for GPU-based processing? As the PCIe data path does not pass through VI4, is it now on me in the FPGA to pack/unpack/format pixel data as required by my software application?

I’ve seen posts here about maximum PCIe performance and seems like there are some MMU settings to take into account. I plan on having to develop my own Linux device-driver of course for controlling the imager device and customm SGDMA IP on the FPGA side.

Only concern is ISP can’t support PCIE interface you have to debayer by software.

Thanks Shane, that’s a good point; I’ll look into seeing how much logic area I have in my FPGA but I imagine it’s something I could do in hardware before putting the image data in memory.

You can try software debayer from Fastvideo SDK:


These are benchmarks for that SDK on X2:
https://www.fastcompression.com/solutions/jetson-tx2.htm

For 4K image with 16-bit data, high quality DFPD debayer takes around 7.5 ms, so for 20-MPix it should be around 19 ms.