Fastest way to get CSI/MIPI cam into a CUDA memory buffer


I have a jetson TX2, with a PI cam V2 connected, and I am wondering what is the best method to get the camera data streaming into CUDA memory? Ive looked at the L4T docs, but Im having a hard time digesting them. I am new to CUDA and typicially work on the driver side of things, so maybe I am going about it the wrong way. I would like to go, in the fastest way possible:

Pi cam v2 -> MIPI -> buffer in cuda memory.

Ive seen something about zero copy as well?

Thanks in advance

I’m not sure whether it’s the fastest, but this is pretty darned fast (and you can modify it as you wish).

I believe there are usage some examples in that repository as well as in jetson-inference.