Program aborts for no obvious reason with openMP

Hello, I am having a program using the following subroutine:


subroutine conjGrad1(Kffv,bvec,n,nrows,rowIndex,Icol,Idiag,ierr)

implicit none

! solve a system of equations [A]{x}={b}, using the conjugate gradient method

integer n, nrows ! n: number of values in the column representation of sparse matrix, nrows: number of rows in system of equations
real8 Kffv(n),bvec(nrows), xvec(nrows)
8 rvec(nrows)
real8 normv1,normv2, normv0

integer rowIndex(nrows+1)
integer Icol(n)
8 Kdiag(nrows)
integer Idiag(nrows)
real8 pvec1(nrows), vec1(nrows)
8 alpha1,beta1,denom1

integer i1,i2, iiter, ierr

ierr = 0
xvec = 0.d0
pvec1 = bvec
normv1 = 0.d0

do iiter=1,1000

if(iiter.eq.1) then
rvec = bvec
pvec1 = bvec

normv1 = dot_product(bvec,bvec)

normv2 = normv1
normv0 = normv1
beta1 = normv1/normv2
!write(101,) ‘step 1…’
do i1 = 1,nrows
pvec1(i1) = rvec(i1) + beta1
end do !i1

! begin by calculating vec1, which is the product [A]{p}
vec1 = 0.d0
!write(101,*) ‘step 2…’
do i1 = 1,nrows
do i2 = rowIndex(i1), rowIndex(i1+1) - rowIndex(1)
vec1(i1) = vec1(i1) + Kffv(i2)*pvec1(Icol(i2))
end do !i2
end do !i1

denom1 = dot_product(vec1,vec1)

alpha1 = normv1/denom1
!write(101,) ‘step 3…’
do i1 = 1,nrows
xvec(i1) = xvec(i1) + alpha1
rvec(i1) = rvec(i1) - alpha1*vec1(i1)
end do !i1

normv2 = normv1

normv1 = dot_product(rvec,rvec)

if(normv1.le.1.d-24normv0) then
elseif(normv0.le.1.d-8.and.normv1.le.1.d-38) then
) iiter,normv1
end do! iiter

if( ierr = 1

bvec = xvec ! finally, when we return from the subroutine, we replace the RHS vector bvec by the solution vector xvec

!write(501,) ‘iiter=’,iiter
) normv1,normv0
end subroutine conjGrad1]


When I try to run my program, it aborts for no obvious reason, in one of the parallel regions. When I rerun the same exact program (without any change in the code) with 1 thread, it runs without any problem. What could I be doing wrong with my openMP directives?

I think I found what was wrong in my code. I was setting the number of OMP threads through calling the OMP_SET_NUM_THREADS function in my main program. I was calling OMP_SET_NUM_THREADS again in one of the subroutines, thinking that I need to re-define the number of threads in the local computation of the subroutine. This was causing the unpredictable behavior. After removing the second call to OMP_SET_NUM_THREADS, the program does work without any problem.

Was the second call to omp_set_num_threads inside or outside of a parallel region? If outside, then it seems like you can reproduce the problem by just calling omp_set_num_threads(2) twice in a row?

No, the second call was outside a parallel region. I basically had a parallel region in the main program. After the end of that parallel region, I was calling the subroutine which was locally setting the number of threads. Also, the program was running normally when it was built with Intel’s compiler. So yes, the problem appears to be due to calling the omp_set_num_threads more than once in a program.
Interestingly, the program crashed after the specific subroutine was called a number of times. So, it may take a number of calls to omp_set_num_threads for the crash to occur.