Let me try to answer your questions with what I have found out so far.
As far as I have seen, transferring buffers from NVMM memory to normal memory always involves a copy, even when copying to CUDA memory. This I surmise from the performance of the operations I’ve observed. Also copying to and from CUDA memory again involves copying so I haven’t seen anything that looks like zero-copy between any pairings of normal memory, NVMM and CUDA memory, at least using the gstreamer API.
NVMM memory is, as answered before by DaneLLL is a set of DMA buffers. As far as I can tell, it’s just normal memory mapped to be usable by hardware encoders, decoders and converters. This should mean that the copies go through the memory bus, no extra overhead involved. Funny thing is, the same is true for CUDA memory (as the TX2 has no dedicated GPU memory) but still copying to and from CUDA memory is more costly than copying normal memory. I haven’t found out why.
All of the above is from experience, so I might be wrong on some points. Take it with a pinch of salt!
In your case, I would suggest taking a look at Tegra Multimedia API and Argus. It would allow you to map NVMM memory and directly access it. I haven’t used it to receive data from camera (my use case involved a non-NVMM-enabled camera and the hardware encoder) but as far as I have seen it is possible. Take a look at the 09_camera_jpeg_capture example in Tegra Multimedia API. It shows how to acquire frames from the camera. After that, you can map the memory with NvBufferMemMap and just use it for whatever purpose you have in mind.