Get raw video frames as fast as possible

I’m trying to use the Nvidia Video Decoder [1] under Linux. My need is to get the decoded frames in greyscale into some array or simmilar as fast as possible. I’ve compiled the sample [2] and it runs with 1100 fps which is ok. The video is not shown but with “-displayvideo” I get a super slow output. Further I’ve found “For Linux platforms, you will need to write your own video source and parsing functions that connect to the Video Decoding functions.” at [3] but with “-displayvideo” I definitely get a video on my desktop. Nevertheless, I’m stucked at this point. Is there any example showing how to get this library run in a minimalist way? The example cudaDecodeGL at [4] is much to compilcated for a good understanding. I just need to get the raw frames in some way.

[1] http://docs.nvidia.com/cuda/video-decoder/index.html
[2] /usr/local/cuda/samples/3_Imaging/cudaDecodeGL
[3] http://docs.nvidia.com/cuda/samples/3_Imaging/cudaDecodeGL/doc/nvcuvid.pdf
[4] http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-video-decoder-gl-api