i’m using a kernel that is accessing intensively to a single row of a matrix per thread, with that row is different from thread to thread.
Can i cache that row in any way? i don’t want to use shared memory, cause each row of each thread isn’t shared between threads of same group. The row will be an array of float, not so long, about 17-25 elements.
Thanks for help