Our app uses 3D float texture. 192 threads per MP.
Each thread accesses unique x,y,z locations.
However subsequent accesses are closer to the previous x,y,z locations accessed by the same thread.
Assuming a texture miss, results in 8 float access (8 directions) per thread then 192 threads will iniitially bring in 19284 bytes ~ 6K.
Slightly closr to 8k - size of texture cache…
However if the hardware does more than 8 float access per texture-miss, then it could result in cache-overflow and hence subsequent accesses may NOT really be cachy-cachey…
Is the number of float access per texture-miss documented somewhere?