about NVIDIA GPUDirect for Video..

Sorry for saying that but AMD DirectGMA on firepro SDI cards seems better as is equal to CUDA P2P GPU-GPU functionality avoiding host copies… yours is going through host mem altough goodly using no copies on host… also sorry don’t know what magical feature has NVIDIA GPUDirect for Video, it since CUDA 4.0 has cudahostregister for pinning host mem allocated without CUDA host mem alloc calls… hope you improve soon to a use P2P PCIe transfers avoiding host transferes similar to directgma from AMD

CUDA GPU to GPU memory copies don’t go through the host (at least I think that’s what you’re asserting) - from the programming guide, section 3.2.6:
“Note that if peer-to-peer access is enabled between two devices via cudaDeviceEnablePeerAccess() as described in Section 3.2.6.4, peer-to-peer memory copy between these two devices no longer needs to be staged through the host and is therefore faster.”