Loop is parallelizable

Hi again

I am a student trying to learn more about GPU’s but I have a few questions about the following code:

!$acc region
do k = 1, n1
do i = 1, n3
y=0
do j = 1, n2
y = y + a(i,j) * b(j,k)
enddo
c(i,k) = y
enddo
enddo
!$acc end region

This code comes from the matrix multiplication sample provided by PGI and I have tried running it but the innermost loop does not seem to be parallelized. If possible could someone help me completely parallelize all the loops? The message I receive is:

37, Loop is parallelizable
38, Loop is parallelizable
Accelerator kernel generated
37, !$acc do parallel, vector(16)
38, !$acc do parallel, vector(16)
CC 1.0 : 12 registers; 24 shared, 64 constant, 0 local memory bytes; 66 occupancy
CC 1.3 : 12 registers; 24 shared, 64 constant, 0 local memory bytes; 100 occupancy
41, Loop is parallelizable
57, Loop interchange produces reordered loop nest: 57,59,58

If you are wondering why this code has been rewritten from the original:

!$acc region
do k = 1,n1
do i = 1,n3
c(i,k) = 0.0
do j = 1,n2
c(i,k) = c(i,k) + a(i,j) * b(j,k)
enddo
enddo
enddo
!$acc end region

The reason is that when I tried to compile the original code, I would receive the following message:

60, Complex loop carried dependence of ‘c’ prevents parallelization
Loop carried reuse of ‘c’ prevents parallelization
Inner sequential loop scheduled on accelerator

(On a side note, variables x and m were not accepted in the loops for some obscure reason) Please let me know if anyone has come across those messages.

Thank you for your time!

-Chris

Hi Chris,

The inner loop is not parallelizable. Since you’re a student, I’ll let you ponder a bit as to why. Please let me know what you come up with. If you still don’t see it, I’ll give you a clue.

(On a side note, variables x and m were not accepted in the loops for some obscure reason) Please let me know if anyone has come across those messages.

How are you using x and m? As loop index variables? Did you declare them as integer? If you didn’t declare them, they will implicitly declared as real and can’t be used as index variables.

  • Mat

Thank you for you help Mat!

My guess will be that it is because there is a dependency on the last iteration so they must be done sequentially. I hope I hit bull’s eye ;p

-Chris