Performance about moving data to CPU memory?

I decode a 640x480 H.264 bitstream on Ion2(My Os is Linux). If I decoded the video and render it with OpenGL,
the frame rate is 216fps and CPU loading is about 40%.
However, if I used paged locked memory to copy the decoded ARGB video to CPU memory, the frame rate down to
112 fps and CPU loading is about 27%.
I found the speed significantly decrease when I copy the decoded video from GPU to CPU.
Is this correct? Can I make ithe speed faster?