cudaTextureObject_t 10% slower than texture<T> ?

Is a 10% loss of performance normal? Profiling on CUDA showed that cudaTextureObject led to more requests(115MReq) than pure texture references(93MReq), using practically the same device code, they differ only on fetch calls(tex1Dfetch(cudaTextureObject_t,…) vs tex1Dfetch(texture,…).

Shouldn’t it be the same total of MReq? I notice this odd behavior using textures for uint2, for linear memory data.

I’m using CUDA 6 x64, VS 2012, Win 8, Driver 335.23 and a GTX 780 Ti amp (compute_35,sm_35).