Hello,
I am trying to understand the usage of texture memory for 2D data access,
so according to literature, Texture memory in Cuda is optimal for 2D spatial locality, to verify that :
I created two simple kernels one using texture memory and another using global memory that just does an averaging filter 3 by 3 of of an image (512x512).
as follows :
now when I profiled this two kernels, here are my findings :
Execution time | Texture | Global memory |
---|---|---|
2 ms | 0.9 ms |
To further investigate the cache usage efficiency and global memory efficiency for global kernel, here are the following results :
Texture | Global | |
---|---|---|
sm_efficiency Multiprocessor Activity | 99.92% | 99.80% |
achieved_occupancy | 0.96 | 0.88 |
ipc Executed IPC | 0.77 | 2.68 |
tex_cache_hit_rate Unified Cache Hit Rate | 93.26% | 62.47% |
gld_efficiency Global Memory Load Efficiency | NA | 61.46% |
What could be the reason for the low level of IPC in texture memory compared to global one, and the execution time difference ?