CUDA operating on 32fp RTT Need feedback on how to use cuda to post process 32fp RTT

Using CUDA 4.0, GCC,

I have an app that renders two passes… the first pass the scene is rendered to a 32 bit floating point texture via FBO, and the second pass takes this texture, does some post processing and calibration on it and displays it on a full screen quad. I now have the need to add the ability to do some CUDA processing on the texture prior to the second pass. In looking at the CUDA SDK samples, I came across postProcessGL which appears to be close to what I need, but it uses PBOs. I am wondering if what I need to do is possible with CUDA. The way I see it, the general steps are as follows:

Render to 32bit Float Texture attached to FBO
Pass This 32bit Float Texture to a CUDA kernel, and modify texture
Then pass same texture to next render pass for more post processing, calibration and display.

Is this doable with CUDA 4.0 and is postProcessGL implementing the process I need to use?