I’m attempting to covert some already working code to use multiple threads. I’m using the #pragma for statement to do this. Sometime after the new threads are forked malloc is getting called. I’m beginning to suspect that perhaps it is not thread safe?
Malloc is thread safe. What I suspect is going on is that you haven’t privatized these variables and that each thread is trying to malloc the same variable. Try using malloc outside of the parallel region or privatizing these variables.
Hope this helps,
What if the variables in question are declared and malloced in a function called by the parallel region? I would have thought that would make them private.
If they are local variables, then yes each thread would have their own private copy.
Have to tried using the PGI debugger, PGDBG, to determine the cause of the seg fault?
Yeah, it seems to stop at the top of a for loop inside the called function. I can’t see anything wrong with any of the variables in the call. I’m not sure I’ve quite got the hang of debugging with multiple threads so maybe I’m not seeing the whole picture. Are there any tutorials on how to pin point issues inside threads?
One thing I’ve gotten from the debugger is the following error message:
“jpgdbg newStack: function without stack address” I get that message whenever I switch focus to a thread that has SigSegv’d. Some threads do make it past that point.
I’m not sure about the jpgdbg message, so sent a note off to our lead tools engineer. Though, he’s not seen this before so will need to do some investigation.
As for a tutorial on how to debug OpenMP programs, we unfortunately don’t have a standard tutorial. My suggestion is to first start with the serial code and ensure that it’s working correctly. Compile with the debugging flags “-g -Mbounds -Ktrap=fp -Mchkptr -Mchkstk” and see if anything abnormal occurs during the run. Also, try using Valgrind (http://www.valgrind.org) to see if you have any uninitialized memory issues.
Once the serial code checks out, try running the OpenMP version, again with the debugging flags enabled. Also, try setting your environment’s stack limit to unlimited. OpenMP programs use a lot more stack space than a serial program. Though, stack overflows typically occur at a call not at the top of a for loop.
Also, are you checking the return value from malloc? Perhaps you’re running out of memory?
Let us know how it goes,