I’m currently managing to store a 187 by 66 matrix (single precision) on-chip. The goal is actually to do some heavy duty work on a 240 by 66 matrix, everything on-chip. There is enough space in the register file combined with some shared memory.
I’m using a lot of unrolling to make sure nothing spills over into local memory. When trying to unroll further ( 187+) the compiler starts getting unhappy :
“Advisory: Loop was not unrolled, too much code expansion”
Has anyone experienced similar issues? Any workarounds ?
Grateful for any advice!