In fermi architecture, the global memory and local memory are all cached, just as texture memory and constant memory are cached. In the past, I often copy data from global memory to texture memory because the access speed is faster especially when the read is uncoalesced. Now the global memory is also cached, does that mean the performance on global memory and texture memory will be similar?
I think the only remaining advantages for texture memory are the hardware accelerated normalized coordinates, auto-normalization of texture values, and linear interpolation.
I think the only remaining advantages for texture memory are the hardware accelerated normalized coordinates, auto-normalization of texture values, and linear interpolation.
Most kernels I’ve tried have benefited from switching from tex1Dfetch reads to global memory reads. At least one experiences reduced performance with the switch - so apparently some memory access patterns are still better handled by tex1Dfetch.
Most kernels I’ve tried have benefited from switching from tex1Dfetch reads to global memory reads. At least one experiences reduced performance with the switch - so apparently some memory access patterns are still better handled by tex1Dfetch.
Oh, excellent question. I have never tried to Fermi. But I think using texture still have many benefits in accessing datas located in CUDAArray (optimize for 2D and 3D).
I appreciate if you or somebody around here do a comparison in accessing speed between texture and Global memory.
Oh, excellent question. I have never tried to Fermi. But I think using texture still have many benefits in accessing datas located in CUDAArray (optimize for 2D and 3D).
I appreciate if you or somebody around here do a comparison in accessing speed between texture and Global memory.
Some more thoughts on this I just remembered:
I asked Michael Garland about textures vs. L1 in his presentation for at the VSCSE summer school last week. He confirmed what we are saying here, that sometimes L1 is better and sometimes the tex cache was better for the sparse matrix vector multiply kernels he works on. The interesting thing he added is this: added benefits are possible by making use of both caches in a single kernel. They are independent caches, after all! The idea is to read from one array with tex1Dfetch (or tex2D/3D) and from the others with L1. 1) It limits the L1 cache pollution and 2) It gives you a larger amount of cache memory to read from.
I’ve only got one kernel that performs cached reads from 2 different arrays which I can try this idea out on - it did lead to a slight performance improvement. The improvement likely wasn’t that great because the 2nd array read is not in the inner loop and only performed once for every ~30-40 inner loop random reads.
It is too bad that the tex cache is so shrouded in secrecy that we can’t know what access patterns work well for it. Even a cache line size would be something!
Some more thoughts on this I just remembered:
I asked Michael Garland about textures vs. L1 in his presentation for at the VSCSE summer school last week. He confirmed what we are saying here, that sometimes L1 is better and sometimes the tex cache was better for the sparse matrix vector multiply kernels he works on. The interesting thing he added is this: added benefits are possible by making use of both caches in a single kernel. They are independent caches, after all! The idea is to read from one array with tex1Dfetch (or tex2D/3D) and from the others with L1. 1) It limits the L1 cache pollution and 2) It gives you a larger amount of cache memory to read from.
I’ve only got one kernel that performs cached reads from 2 different arrays which I can try this idea out on - it did lead to a slight performance improvement. The improvement likely wasn’t that great because the 2nd array read is not in the inner loop and only performed once for every ~30-40 inner loop random reads.
It is too bad that the tex cache is so shrouded in secrecy that we can’t know what access patterns work well for it. Even a cache line size would be something!