Problems switching from 15.10 to 16.7+

Hi,

I am currently trying to get my OpenACC code that was developed and validated with PGI 15.10 to work similarly with 16.7 or later.

The code compiled with 16.7/9 without having to make any changes, but running some tests immediately showed the runtime behavior has changed (crash to partially present variable etc.).

So I took a step back and did a 1-1 comparison of the PGI compiler output between 15.10 and 16.7. I also read the release notes for PGI 2016 and expected that backward compatibility might possibly be broken in my code due to the change of e.g. data copyin being handled as data present_or_copyin whereas prior to 2016 there was a difference between the two statements.

The compiler output already indicates that several things have changed how PGI 2016 implements my OpenACC statemments. The following is a list of issues/questions:

  1. Does the order of the OpenACC related compiler output matter?

15.10

Generating enter data ...
Generating update device ...
Generating present ...

16.7

Generating present ...
Generating update device ..
Generating enter data ...

15.10 reports the “logical” order as given by the corresponding openacc statement in the code whereas for 16.7 it can be arbitrary. So I am hoping the order does not play a role in terms of what the runtime does in the end otherwise I would expect errors.

  1. I was under the impression that in a kernels statement,
acc loop independent collapse(3)

for a triple loop nest would enforce parallelization (keyword independent) even if the compiler anticipates a possible loop carried dependence by some variable if “independent” were not specified. I thought with independent I could tell the compiler that I know better and can guarantee that there is no dependence. Using independent works with 15.10 but not with 16.7, i.e. the compiler reports that it will execute the loop nest sequentially, which kills all my performance.

  1. Another abundant issue in my code with switching to 16.7 is the appearance of “Loop without integer trip count will be executed in seq mode”. In an earlier post

Posted: Wed Nov 18, 2015 9:41 am Post subject: strange behavior with enter data copyin

I had a related question and thought I understood the explanation. But in this case I am not using any enter data copyin statements before the loops. Those are plain simple loop nests and the loop bounds are class member variables, which I don’t explicitly copy to the device. Since the loop is so simple I guess it must have something to do with the class member variable being the bound. Did I overlook something in the release notes that can explain this changed behavior?

Thanks in advance for sharing any insights to help me overcome those issues.

Best Regards,
LS

Hi LS,

For #1. The order of the directives in your code does matter and the output of the compiler directives should match the line order in your file. Something else is going on here. Can you please give more details and if possible, an example?

For #2. The independent clause disables compiler dependency analysis and tells it to go ahead an parallelize the loop. When used with a collapse clause, independent is applied to all of the tightly nested collapsed loops. Again, I suspect something else is causing your issue. And again, having more details and an example will help.

For #3. This message basically means that the loop trip count can change during the execution of the loop. What could be happening here is that your loop bounds variable is a pointer, or is a class member and your calling a method from the device. In which case the compiler must assume the class member gets updated. If this is indeed the case here, the difference between compilers could be in that we’ve greatly improved scoping analysis and were most likely incorrectly not detecting this case in 15.10.

Of course without details, this is just a guess.

  • Mat

Hi Matt,

Regarding #1, good to know. So there must be an issue and I will send you an email with more info.

Regarding #2, thanks for confirming my expectations. So again I will provide you with more info.

Regarding #3, I understood the explanation about the loop trip count. But from what I see the loop bound variable is not a pointer. It is actually a class member, but I am not calling the member function from the device (acc routine). Instead it is called from the host and inside the member function body I have my loop nest.

Thanks,
LS