Prevent the compiler from unrolling loops

Investigating what my kernel gets compiled to it seems like the compiler unrolls my loop for me:

I really don’t like it since it uses too much registers and lowers my occupancy. Is there any way to prevent it from doing so?

Also the additional loads never get hit because the array is fixed size thought that making WIDTH and BLOCK_DIM compile time constants will fix that but somehow the compiler still adds loads that use registers and never get hit

To keep a loop fully rolled, insert the following in the line just before the relevant for or while statement:

#pragma unroll 1

Recent compiler versions (last couple of years?) unroll more aggressively than older ones. This has annoyed me at times, however according to careful measurements I took in several of these cases, the compiler made the right decision, as the unrolled versions were faster (but often barely so, like 1-2%) compared to the rolled loop.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.