performance of cuda h264 encoder

my program generate image using ogl api,then readout the image using “glReadPixels()”,then send it to nv cuda encoder to generate h264 stream;now I want to optimize my program using “NVVE_DEVICE_MEMORY_INPUT” flag,I think if I can deliver the image in device to encoder,the IO bandwidth will be saved markedly,so the performance of program should be increased;What I think is right?Whether the optimization is worth to do?