Video Encoding

Hi everyone,

I have heard that NVidia GPUs have a video processor. Is it programmable ?


I’m not sure, but I wouldn’t be surprised if video decoding is just one of the tasks the general purpose unified shader units handle, the same that handle cuda, vertex, pixel, geometry etc shaders.

I think it’s a seperate unit called VP2 (Video Processor 2) which exonerates the CPU when processing MPEG-2, WMV, H.264 and VC-1 formats.

There’s also a Bitstream Processor (BSP) which handles the first decode stage (bitstream processing) for H.264.
For VC-1 this stage is still handled by the CPU.

Please correct me if I’m wrong :)

But I don’t know if they are programmable.

But why would they go through the cost and trouble of adding a separate unit if their main unit is fully programmable? As video decoding requires massive processing power, It makes no sense to me.

Indeed, the first decode stage would be very hard or even impossible using parallel processing, So it makes sense to have separate circuitry for that.

I wonder how much of this is true:

This AES128 engine also sounds very useful.

Sounds reasonable wumpus,

maybe some NVIDIA guys around here know the exact implementation of
the video encoding.

For some unknown(to me) reason, Nvidia guys don’t say a thing about their VP processor, and how we can use it. I read somewhere that this VP2 accounts for most of the transistor increase in 8800GT.

We’re happy to talk about the video processor. The VP is a separate hardware unit and programmable (although not easily), but currently the only way for developers to access its functionality is via DirectShow or XvMC in Linux. We’re still investigating if it makes sense to expose this hardware from CUDA.

Thanks for the information!

Would it be programmable enough to, for example, add a custom codec? Or is it hardwired to H264 and Mpeg2?

The advantage of programming the VP would be that it can decode bit streams, with algorithms like the arithmetic decoding in H264. This is very hard to do in the CUDA programming model currently.

Access to the video processor from CUDA would be great. Access via xVideo on Linux is limited.