I tried the CUDA Video Encoder introduced in CUDA 3.2 SDK, and it did work!
However, I don’t know how to How to play the .264 file outputted by CUDA Video Encoder, I tried lots of codes and players (FFMPEG, FFDSHOW, MPC, VLC…), these all failed to render it. And the CUDA SDK Decoder also couldn’t decode it correctly…
Won’t the output of the encoder API be the “naked” video stream? You will still need to embed the video into a suitable container format then write that out to disk. Most decoder libraries can’t detect the video format of a raw bitstream, and need metadata from the container to work out how to play the video.
The VLC (by the way VLC is not dependent on DirectShow in any way) is able to play elementary H.264 bitstreams. The stream has to begin with a SPS (sequence parameter set: 00 00 00 01 67 = startcode) followed by a PPS (picture parameter set: 00 00 00 01 68 = startcode) usually followed by a coded I slice (00 00 00 01 65 = startcode). Most encoders insert the SPS/PPS combination before an I frame. Some encoders use 00 00 00 01 27 as a start code for the SPS and 00 00 00 01 28 for the PPS. The SPS/PPS is needed to set up the decoder. It contains all required information like resolution and framerate. For the VLC I use the file extension *.h264. Alternatively you can use YAMB2 (Yamb » Yet Another MP4Box UI) to create a mp4 file that can also be played by the MPC HT (Media Player Classic - Home Theater). If YAMB can not multiplex a valid mp4 file than the bitstream is corrupt.
If you start recording in the middle of the bitstream try to find the first SPS header and cut off all bytes before. If you can not find a SPS header then something is wrong.
I was able to successfully write the bitstream into an AVI with the C API and to an mp4 file using the DirectShow filter (this only worked with the ATI MPEG Multiplexer).