I’m trying to launch a function with target region inside from a task, but the application get a sigsegv. This is the function with target region inside:
subroutine add2s2_omp(a,b,c1,n)
real a(n),b(n)
!$OMP TARGET TEAMS LOOP
do i=1,n
a(i)=a(i)+c1*b(i)
enddo
return
end
And I call like that:
!$OMP TASK
call add2s2_omp(b,bb(1,1),-alpha(1),n)
!$OMP END TASK
!$OMP TASKWAIT
The application get:
[jwb0033:3448 :0:3448] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
nek5000: malloc.c:4048: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
[jwb0033:3450 :0:3450] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
[jwb0033:3446 :0:3446] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
nek5000: malloc.c:4048: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
Is it possible to call from a task a routine with target region inside? I’m using NVHPC/21.5-GCC-10.3.0 and ParaStation MPI. Thanks.
I prepared a little test with same dimension used in nek5000 test. The application does not get sigfault, but usigng task the output matrix is not modified. With no task, output is well modified. Maybe this could be a part of the problem?
A task needs to be within a parallel regions and works correctly when I add it. While I’m not sure if this is the problem in the full code, it’s what’s wrong here.
% cat task_test.f
program task_test
implicit none
real, dimension(:), allocatable :: b
real, dimension(:, :), allocatable :: bb
real alpha
integer i, n, m
n = 4669440
m = 1
alpha = 1.3
allocate(b(n))
allocate(bb(n,m))
do i=1, n
b(i) = 1.1
bb(i,1) = 1.2
end do
!$OMP PARALLEL
!$OMP SINGLE
!$OMP TASK
call add2s2_omp(b,bb(1,1),alpha,n)
!$OMP END TASK
!$OMP END SINGLE
!$OMP TASKWAIT
!$OMP END PARALLEL
do i=1, 10
write(*,*) b(i)
end do
deallocate(b)
deallocate(bb)
end program
subroutine add2s2_omp(a,b,c1,n)
real a(n),b(n)
!$OMP TARGET TEAMS LOOP
do i=1,n
a(i)=a(i)+c1*b(i)
enddo
!$OMP END TARGET TEAMS LOOP
return
end
% nvfortran -mp -acc task_test.f -Minfo=accel; a.out
add2s2_omp:
40, Generating implicit map(tofrom:b(:),a(:))
2.660000
2.660000
2.660000
2.660000
2.660000
2.660000
2.660000
2.660000
2.660000
2.660000
Hi Mat, yes, I totally agree. Now the application works with no error. But another question. Using such approach in a loop, performances are quite bad:
!$OMP PARALLEL
do k = 2,m
!$OMP TASK
call add2s2_omp(xbar,xx(1,k),alpha(k),n)
!$OMP END TASK
!$OMP TASK
call add2s2_omp(bbar,bb(1,k),alpha(k),n)
!$OMP END TASK
!$OMP TASK
call add2s2_omp(b,bb(1,k),-alpha(k),n)
!$OMP END TASK
!$OMP TASKWAIT
enddo
!$OMP END PARALLEL
I rarely use tasks myself so may not be of much help, but don’t you need a “single” region so each thread doesn’t spawn every task? Don’t know if this would fix the performance issue, but you’re generating more task than needed here.