It depends if one is reading data that can be heavily cached - in that case texture bandwidth can be ~150 Gbytes/sec.
In any case one should be able to get at least 60 GB/s on a one-one texture copy (see GPUbench - it uses float4s). In my own code I have noticed that switching from float2 to ushort2, which should halve the performance, has a much smaller effect. I had chalked this up to becoming latency bound, but perhaps the texture unit just isn’t very efficient with these data types?
It also seems possible that you could actually be write bound. Have you tried just writing out a constant and seeing how fast that goes?