Issues with CUDA texture read using driver API

I am using driver API cuTexRefSetAddress2D() for binding to texture. I get correct results as long as I bind to 32 bit accesses, but get wrong results if I bind to 16 or 8 bit accesses. The value read is in the vicinity of the correct location, but off by one or two bytes.

In an parallel setup, where I use the runtime API cudaBindTexture2D() for texture binding, I get correct results for all accesses (32, 16 and 8 bit).

Any pointers regarding this?

I am compiling for compute architecture 3.0 with CUDA 5.5 and running on GTX 650 Ti card, with Ubuntu on host PC.

Sample code and further details are posted here http://stackoverflow.com/questions/19315548/issues-with-cuda-texture-read-using-driver-api.

Could one of the Nvidia engineers look into this please? This looks like a potential bug to me. In case more clarification or support is needed, please leave a comment.