cuvidVideoDecoder example (SDK2.0) hangs computer on GTX280 cuMemcpyDtoD - OK cudaPostProcessFram

I try to assemble and run sample application “cudaVideoDecode” on GTX280 card.
(also I have 2 Tesla C870 installed on this computer)

Every time I launch it - computer hangs (i suspect video card stops to display).

Then, I found an option to display gray scale images.
When I enable cuMemcpyDtoD (videoDecode.cpp:439) instead of cudaPostProcessFrame (videoDecode.cpp:443)

application works almost correctly. It displays 4 copies (horizontally abutted) of video clip in greyscale.

an issue is, that cudaPostProcessFrame method invokes the kernel instead of blind memory copy.

Right before hanging, the debugger reports that the method “cuvidMapVideoFrame” returns code CUDA_ERROR_LAUNCH_FAILED

Does anybody know what is happening?
I imagine that the kernel “NV12ToARGB_drvapi” is being launched some wrong way (incompatible with GTX280)
How to launch it correctly?

I could not find this aplication in the SDK. Where is it?

The solution from Eric Young is here (3.15 MB)

in “projects” folder