Serialize inner loop (CUDA C)


How to make sure that inner loop in a nested for loop does not get parallelized? I tried to use “seq”, but somehow compiler seems to ignore it. What is a right way to use it? A sample code is given below.

#pragma acc region
#pragma for independent
for ( int i =0; i <outer; i ++) {
     int temp = 0;
     for ( int y=0; y<inner ; y++) { // make this loop execute serially
          temp += array[y][i];
     final[i] = temp;



Hi WalS,

You can try using the “kernel” clause on the outer loop.

#pragma for independent, kernel
for ( int i =0; i <outer; i ++) {

Though, you probably don’t need the independent clause here.

Hope this helps,

Hi Mat,

Thanks for quick reply. I had also tried using “kernel”. Still the loop is parallelized by compiler. After your suggestion, I tried it without “independent” clause, but no use.

Any other suggestions?

Thank you!

Hi WalS,

Can you please post a reproducing example? I’ll need to see the code in context to get a better idea of what’s going on.


Hello Mat,

It was a stupid mistake of checking wrong compiler report. It is getting compiled correctly. Sorry for the trouble. Thanks a ton!