Common blocks in OpenMP

Hello,

In my last thread (here), I mentioned a problem I was seeing with common blocks in the OpenMP paradigm. After some testing, I’m out of ideas, so I’d appreciate some advice.

The code I’m using makes use of common blocks. What I think is happening is that when subroutines inside the parallel region use these common blocks, they refer back to the global values – even if the individual variables have been declared as “private”. If, on the other hand, I make the common blocks “threadprivate”, the code outside of the parallel region cannot use those variables to initialize them; subroutines (outside the parallel region) that should be seeing values for those variables are getting 0’s instead.

Is there a way in OpenMP to (1) get the compiler to create local copies of common blocks for use in a parallel region, (2) initialize those variables according to any pre-existing values in the global common blocks, and (3) distinguish based on location which version of the common block to be using?

Hi dcwarren,

If, on the other hand, I make the common blocks “threadprivate”, the code outside of the parallel region cannot use those variables to initialize them; subroutines (outside the parallel region) that should be seeing values for those variables are getting 0’s instead.

The serial portions of the code’s common block should have the same value as the master thread. I’m not sure why you’re only seeing zeros. Are you initializing the common block before entering the first parallel region?

  • Mat

I ended up writing this issue off to bad memory management by the person who wrote the code. The common blocks were added any time they were needed with whatever variables were needed at the time, so there was a hideous mix of variables that needed to be private/shared/constant scattered through all of the common blocks.


I spent the weekend rewriting the code to restore some sanity to the way this code manages its variables, and now I have a bizarre new problem. The code now loses the value of arguments to one of its subroutines. Here’s how the code looks (ignoring lots of structure around these lines, such as declarations, etc.):

...

!$omp parallel do default(shared)
     (omp region)
!$omp end

  call foo(bar1, bar2, ...)

...



subroutine foo(bar1, bar2, ...)

do i = 1, bar2
  do ii = 1, bar1
    ...

What I’m seeing is that the variables bar1 and bar2 have values of 61 and 4 (respectively) throughout the OpenMP region and up to the call to subroutine foo. However, immediately upon entry into foo, both have a value of 0, making them quite useless as an upper bound to that do loop.

This only happens if the -mp flag is used at compilation. If I leave off that flag the code runs as expected. Any ideas?


Edit: Just stumbled across a comment that shared variables become undefined upon exit from an OpenMP region. If this is true, I guess my question is now the much easier, “How can I prevent this?”

Just stumbled across a comment that shared variables become undefined upon exit from an OpenMP region. If this is true, I guess my question is now the much easier, “How can I prevent this?”

This is only true when the shared variable is a pointer and associated with a private variable (See 2.9.3.2 of http://www.openmp.org/mp-documents/OpenMP3.1.pdf), so I doubt this is the problem.

As to why this is occurring, I’m not sure and it doesn’t really make much sense. How are you determining the values of bar1 and bar2? If you’re looking at them in the debugger, at optimization, they may be stored in a register so printing out the variable may not be the actual value being used.

If that’s not it, I’d start using print statements to see where the values change to zero.

Sorry I can’t be more helpful,
Mat

I am using the debugger, but I’ve turned off optimization. One of the greybeards in my department suggested it might be an array overstepping its bounds somehow, so I’ll try that and print/flush statements to figure it out. Thanks for the speedy response.

Well, this is weird. Running with -C turns up nothing new (“ACCESS VIOLATION”), but I can’t even print these problem variables using a “print *” without getting that same segfault error. I’ve even tried inlining the subroutine and have the same problem.

Sounds like it might be a stack overflow. What happens if you increase your stack size?

  • Mat

After some testing of the “-stack” flag, I’ve conclusively determined I don’t understand what the stack is and how it works. I thought the stack was the memory allocated in RAM for the program to use as scratch space for its calculations. As such, if I increased the stack size I should see a corresponding increase in memory used according to Windows Task Manager. Needless to say, this isn’t what I’m seeing.

The code will crash if I compile with anything less than “-stack=(no)check,6e8,6e8” (the first option makes no difference, and I’m saving you the trouble of counting zeroes). Regardless of (1) the amount of memory reserved for the stack and (2) whether I compile with “check” or “nocheck”, Task Manager tells me my code takes roughly 21MB of memory.

So now the code can run to completion, but there’s another issue: I don’t get the same results with exactly one OpenMP thread as I do compiling without “-mp”. The code does use a random number generator throughout, but why should it generate different states with/without OpenMP when there’s only ever one thread using it? (Testing with more than one thread will follow, but I want to make sure I understand the basics first.)


Edit: Hooray, more oddness! If I use more than one thread, I eventually get an out-of-bounds error at an array. Here’s the structure of the code throwing the error:

do i = 1, n_dummy
  x_local = x_array(i)
  ...

The error message says that I’m trying to access a value of x_array greater than its maximum, which is 41; however, n_dummy is a runtime constant with value 4. Using the debugger – again without any optimization – I have checked that each thread has the correct value of n_dummy just prior to the loop. Within the loop, the iteration variable i has a value of 601028592 or so (and while it is greater than 41, this isn’t the value the error message reports…). This is most certainly not between 1 and 4. Have you seen anything like this before?

Regardless of (1) the amount of memory reserved for the stack and (2) whether I compile with “check” or “nocheck”, Task Manager tells me my code takes roughly 21MB of memory.

The “check”/“nocheck” sub-option enables or disables the runtime code which dynamically commits more stack size as it’s needed. When the commit size is reached and “check” is enabled, more stack is committed up until the reserve size is reached. With “nocheck”, this initialization code is removed and it’s assumed that you have specified a large enough commit size. It will not effect the size of the stack itself.

I don’t get the same results with exactly one OpenMP thread as I do compiling without “-mp”. The code does use a random number generator throughout, but why should it generate different states with/without OpenMP when there’s only ever one thread using it?

The main difference between compiling with and without “-mp” (excluding the OpenMP directives themselves) is that automatic arrays are allocated on the stack. I’d look for uninitialized memory with one of these arrays.

Another possibility is that different optimizations are being applied. How different are the results and do they continue to be different when compiled without optimization (i.e. “-O0”)?

Within the loop, the iteration variable i has a value of 601028592 or so (and while it is greater than 41, this isn’t the value the error message reports…). This is most certainly not between 1 and 4. Have you seen anything like this before?

Is “i” private?

  • Mat

As far as I’m aware, there are no automatic arrays anywhere in the code; all arrays are shaped according to parameters set in a module.

Without any optimization, the results are still different. It’s a Monte Carlo code tracking particles interacting with a shock structure, and what I’m seeing is different numbers of particles making it through each “gate” depending on whether I’ve enabled or disabled OpenMP. Broadly the results seem to be the same, I just can’t think of any reason why a single OpenMP thread should be different from a serial thread. Actually, I take that back; OpenMP unsets certain variables after exiting the parallel region. Is there a short list of conditions for this unsetting I can check? I’m already aware of private variables and shared pointers to private variables.

For lack of any better ideas, I’m changing my OpenMP region from “default(shared)” to “default(none)” just to make sure I have every variable assigned correctly.

Is “i” private?

It’s within a subroutine called from the OpenMP region, so I would assume yes. And OpenMP isn’t like OpenACC (where you can explicitly tell the code to have each encountering thread execute a particular loop serially), so I would expect that loops in OpenMP default to serial execution unless explicitly placed inside an OMP region. Am I right in this?

After a day and a half, I’ve learned something important:

Make sure your CRITICAL regions have only one point of entry and exit.

I enclosed a goto statement within a critical region, and when the code hit the goto it exited the critical region but didn’t lift the barrier. So the next time the code hit that region it froze without an explicit reason.

This has nothing to do with my array bounds issue, which still exists (I’ve used “default(none)” for data declarations and get no errors at compilation). But at least I’ve learned something about OpenMP these last two days.