How to Change Loop Scheduling

Dear Mat,

I have a section of code I am accelerating which looks like this:

!$acc region

!$acc do  vector
         do i = i_start,i_end
            
!$acc do parallel
         do j = j_start,j_end
.......

The default scheduling the compiler gives me is:

Accelerator kernel generated
        278, !$acc do vector(32)
        283, !$acc do parallel
             Cached references to size [32] block of 'jeven'
             Cached references to size [32] block of 'jodd'
             CC 1.3 : 117 registers; 1044 shared, 964 constant, 112 local memory bytes; 6 occupancy

My “i” index can reach 270 in value. To my understanding each block on my GPU can launch a maximum of 512 threads.

I want to change the loop scheduling suggested by the compiler so it can launch 270 threads instead of 32.

I tried using the below:

!$acc region

!$acc do  vector(270)
         do i = i_start,i_end
            
!$acc do parallel
         do j = j_start,j_end
.......

But this causes the compiler to execute the “i” loop in sequence for some reason:

 Accelerator kernel generated
        278, !$acc do seq
             Non-stride-1 accesses for array 'jeven'
             Non-stride-1 accesses for array 'jodd'
        283, !$acc do parallel
             CC 1.3 : 116 registers; 20 shared, 960 constant, 48 local memory bytes; 6 occupancy

Any idea how to get this working?

Thank you for your help.

Hi sindimo,

Can you please post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

While adding a size to vector is the correct way to specify the vector size, something in your code prevents using this large of a size.

Note, it’s best to use vector sizes that are multiple of the GPU’s warp size (32) so you may want to try setting the vector size to 128, 256, or 512.

Thanks,
Mat