can I red some values from global memory without caching?
In my code I need to read offsets from m table but its way too big to fit any fast memory.
I already optymize performance more than 10x by clevery sorting and grouping to avoid serializing atomic operations m array to perform coalesed memory acces or by adding const restict to pointers of I and O but my L1 and L2 still seems way too low and most of the time kernel is stalled probably due to storing m values in them even if they are only readed only once per kernel and discaring any I,O datas that I need to be fast access.
Then, can I read m directly form global memory without touching caches?
// pseudocode
struct map_t
{
int offsetI;
int offsetO;
}
// I - 544*544 image
// O - 360*360 image
// m - constant 18000000 elements array
void kernel(uint8_t *I, uint8_t *O, map_t *m){
O[m[tid].offsetI] += doSomeStuff(I[m[tid].offsetO]);
}