How to measure texture mem latency?

Is it similar to global one or not?

How to set the number of threads of blocks? Previously, I set only one thread with only one block and the result seemed to be unreasonable. And if I set a large block size, there would be some latency overlapping and it would impact the result

So I’m confused about the setup

from someone’s program which is used to measure global mem latency, he/she used start_time ^= value[idx], end_time=clock() in every loop, how do these work?