Issues with CUDA texture read using driver API

I am using driver API cuTexRefSetAddress2D() for binding to texture. I get correct results as long as I bind to 32 bit accesses, but get wrong results if I bind to 16 or 8 bit accesses. The value read is in the vicinity of the correct location, but off by one or two bytes.

In an parallel setup, where I use the runtime API cudaBindTexture2D() for texture binding, I get correct results for all accesses (32, 16 and 8 bit).

Any pointers regarding this?

I am compiling for compute architecture 3.0 with CUDA 5.5 and running on GTX 650 Ti card, with Ubuntu on host PC.

Sample code and further details are posted here

Could one of the Nvidia engineers look into this please? This looks like a potential bug to me. In case more clarification or support is needed, please leave a comment.