Stereo Disparity computation-How to increase performance in Jetson?

I have been working on Stereo cameras to calculate the Disparity map using OpenCV for Tegra 2.4.10. I have installed the cuda toolkit 6.5 and tried the stereo block matching algorithms mentioned in the gpu module. Like mentioned in the wiki http://elinux.org/Jetson/Computer_Vision_erformance with gpu BM gives higher frame rate, but the disparity map result is not so good.

I am able to get a better result with CSBP algorithm but the frame rate drops to 1fps and I found some hanging issue, where the window is stop giving the preview. What could be the reason for hanging and whether Is this the maximum performance achievable (1fps) for CSBP with jetson? How to improve the performance in terms of frame rate?

Thanks in advance!