I think the binaries I compiled with CUDA SDK 2.3 ran just fine on my Fermi device (GTX 460). I only use the runtime API for my projects.
As far as I know nvcc by default embeds the generated PTX code into the executable when using the 2.3 SDK or later. This allows the CUDA 4.0 driver to translate the code as needed for the Fermi architecture - even when the binary is shipped with the CUDA 2.3 runtime library.
I confess I haven’t switched to CUDA 3.x or 4.0 yet. 2.3 is working fine for all of my projects.