Be aware that the hardware interpolator is 9bit precision only (it is built for RGBA8 image data). Use cudaReadModeElementType and do the interpolation in CUDA to get better precision.
Well, lerp is not that expensive. In my experience, it is usually well hidden in the read latency as code pieces that do interpolation are usually memory bound anyway.
This is a bit misleading - to be exact the precision of the fractional part of the interpolant is 9-bit. This means there are only 2^9 possible interpolated positions between each texel, which can cause steps at large magnifications.
The resulting values will still be floating point.