Hi,
I am currently trying to get my OpenACC code that was developed and validated with PGI 15.10 to work similarly with 16.7 or later.
The code compiled with 16.7/9 without having to make any changes, but running some tests immediately showed the runtime behavior has changed (crash to partially present variable etc.).
So I took a step back and did a 1-1 comparison of the PGI compiler output between 15.10 and 16.7. I also read the release notes for PGI 2016 and expected that backward compatibility might possibly be broken in my code due to the change of e.g. data copyin being handled as data present_or_copyin whereas prior to 2016 there was a difference between the two statements.
The compiler output already indicates that several things have changed how PGI 2016 implements my OpenACC statemments. The following is a list of issues/questions:
- Does the order of the OpenACC related compiler output matter?
15.10
Generating enter data ...
Generating update device ...
Generating present ...
16.7
Generating present ...
Generating update device ..
Generating enter data ...
15.10 reports the “logical” order as given by the corresponding openacc statement in the code whereas for 16.7 it can be arbitrary. So I am hoping the order does not play a role in terms of what the runtime does in the end otherwise I would expect errors.
- I was under the impression that in a kernels statement,
acc loop independent collapse(3)
for a triple loop nest would enforce parallelization (keyword independent) even if the compiler anticipates a possible loop carried dependence by some variable if “independent” were not specified. I thought with independent I could tell the compiler that I know better and can guarantee that there is no dependence. Using independent works with 15.10 but not with 16.7, i.e. the compiler reports that it will execute the loop nest sequentially, which kills all my performance.
- Another abundant issue in my code with switching to 16.7 is the appearance of “Loop without integer trip count will be executed in seq mode”. In an earlier post
Posted: Wed Nov 18, 2015 9:41 am Post subject: strange behavior with enter data copyin
I had a related question and thought I understood the explanation. But in this case I am not using any enter data copyin statements before the loops. Those are plain simple loop nests and the loop bounds are class member variables, which I don’t explicitly copy to the device. Since the loop is so simple I guess it must have something to do with the class member variable being the bound. Did I overlook something in the release notes that can explain this changed behavior?
Thanks in advance for sharing any insights to help me overcome those issues.
Best Regards,
LS