Best way to process video in TK1

Hello there!
I am trying to write simple program, which looks like this:
<get frames from v4l (with DMA to host memory)> -> -> ->

Now I use gstreamer1.0 and the pipeline is “appsrc ! nvvidconv ! omxh264enc ! rtph264pay”.
In “appsrc” I have to upload data to managed memory, process it, and then download it back to host memory to transmit gstreamer with “push-buffer” signal. Is gstreamer upload data back to device memory in “nvvidconv ! omxh264enc”?
I tried another approach (so called “zero-copy”). It works, but unfortunately not much faster.
So, my question is “what is the best way (for highest FPS) to make image processing with TK1?”.
Is it possible to transmit data to “omxh264enc” in device memory (so called NVMM)? Or should I use “managed memory”, or “mapped memory” (zero-copy)?
PS: sorry, if it is the most popular question in this forum. I did not find answer.

We have optimization for v4l2 source in MMAPI sample 12_camera_v4l2_cuda, but it is not supported on TK1 release.

On TK1, you have to do memcpy in ‘appsrc -> nvvidconv’.
Max performance should bring certain improvement. Please try