Hardware H264 encoding and decoding High performance video streaming


I am developer of an app which allows live video streaming. I plan to spend some amount of money on NVidia Tesla GPUs to do the video encoding but really I do not think it is a good choice:

Here are the details. I have source stream frames already decoded and I am encoding these again using open source H264 codec. It is really time consuming for CPU to do this work, and CPU time is relative expensive. This is why I want to move the hard work to hi-end GPU and I belive it is really possible with a little amount of work to replace the open source codec to ‘H264 GPU api’ of some sort.

I know there is Windows API for H264 hardware codec and a example. It is working fine however… I can not run secondary instance of this application. Is this a hardware limitation? - Also, I know I can run up to 4 NVidia CUDA kernels but is this problem related to these limitations ? The Windows platform is also not a great choice for me and there is no such video encidng API on the Linux platform. How much I must pay to obtain such API for Linux ?

How I should accomplish this? Real time H264 encoding on the GPU is the primary goal but the problem is - in my opionion - I can not have more video streams than physical GPU cards.

If there is no way NVidia could license this API for Linux then which commercial GPU H264 codec will you recommend for such a solution?

I hope someone will clear my mind. Help!