Code works with PGI_ACC_DEBUG=1 but fails without it

I am currently trying to get my OpenACC code to work without having to use managed memory.

I have the code in a state where it compiles fine, but fails with incorrect results (with managed memory turned on, the code works fine).

Strangely, if I set PGI_ACC_DEBUG=1, the code works perfectly and gets the correct results.

Is there something that PGI_ACC_DEBUG does that could help me find my problem?

  • Thanks

Hi sumseq,

The only thing PGI_ACC_DEBUG would do extra is add more synchronization.

Are you using “async”? If so, you might be coping data via an update directive before the compute region is finished updating the data.

Also up until recently, using managed memory would cause the runtime to not use “async”. We lifted this in PGI 17.7 when running with CUDA 8 on P100s since there was no longer a danger of segfaults when accessing the same memory on both the host and device.

If you’re not using “async”, then my best guess is that you’re missing an “update” directive someplace and one of your device arrays isn’t synchronized with the host copy of the array.

-Mat

Hi,

I am not using any asyncs but knowing that the debug mode adds more synchronizations is a good place to start my bug squashing hunt.

(I am using PGI 17.9, but the problem exists using 17.4 as well).

Could a race condition within a parallel region be the culprit?

Is there anything I could look for using the PGI_ACC_NOTIFY=2 output when using managed memory to see when/where the managed memory is doing an update/sync? There is a ton of output there and I am not sure what to grep.

I don’t think PGI_ACC_NOTIFY is going to help here. That only reports what the PGI runtime is doing. The CUDA driver manages UVM so wouldn’t be reported. Plus this only shows what updates occur but what you need to know is what update your missing (assuming that’s the cause).

Could a race condition within a parallel region be the culprit?

I guess it’s possible, but PGI_ACC_DEBUG is only going to effect synchronization between kernel launches, but not have an effect on the kernel (the parallel region) itself.

I’m leaning towards a missing update or an uninitialized device array. Though, this doesn’t explain why PGI_ACC_DEBUG works.

The way “-ta=tesla:managed” works, is that the compiler simply replaces the underlying memory allocator (malloc, new, allocate) with a call to cudaMallocManaged. And you don’t need to use it on all files. So one thought is to compile everything without managed, and then start doing a binary search where you add managed to half the files until you can pin down to one or more files that when using managed allows it to pass. This hopefully will give you a list of potential arrays to track.

Then recompile without managed and add “update” directives for these arrays before and after each parallel region that they are used. If it starts passing, then you can start taking iteratively taking out the update directives until it starts failing again. Then you’ll know where the missing update goes.


Another tactic you can try, is using the environment variables “PGI_ACC_FILL=1” and “PGI_ACC_FILL_VALUE=”. This causes the PGI runtime to initialize all allocated device data with the fill value (with the default being zero). My one thought is that maybe an array is getting zero’d out when it created using UVM and PGI_ACC_DEBUG but is uninitialized otherwise. I’m just guessing, but it’s an easy thing to try.

Hi,

WORKED! (sort of)

I have been relying on using the “zeroinit” option in the compiler to initialize the GPU arrays to 0.

If I take this out, and then use use PGI_ACC_FILL=1, the code works perfectly!

Aren’t both of these supposed to do the same thing??

I would much prefer to rely on a compiler flag than and ENV variable…

[If OpenACC would include an “init(a)” clause to enter data create, this would be a lot easier…]

I just tested “zeroinit” on a toy program and it worked fine for me. Not sure why it’s not working in your case. If possible, could you send a reproducing example to PGI Customer Service (trs@pgroup.com) so we can investigate?