We have achieved 25 frame-per-second performance for 1920x1080 resolution, 4:2:2, 8-bit video, for both lossless and lossy encoding, by porting some time-consuming modules to CUDA.
We used a quite cheap PC platform, costing only around 1.6K USD (not including display device), based on Intel q6600 and Nvidia 9800 GX2.
Current platform is MS Windows. However, porting to Linux is an easy task, according to your requirements. Also, the performance can be improved with upgraded configuration (e.g. 3K USD for 4:4:4 10-bit video real-time encoding).
For any further cooperation ideas and questions/comments, feel free to contact us (firstname.lastname@example.org).