Hi Mat (and anyone else!)
Ok. for testing purposes, I’ve defined the following in my parallel routine:
print*,'Max Threads:',omp_get_max_threads()
!$OMP PARALLEL SHARED(F,NPTS,NE,E,FL,BM) DEFAULT(NONE)
!$OMP DO PRIVATE(I, FLUX,L,B)
DO J=1, Npts
FLUX.N = nE
DO I=1, nE
FLUX.ENERGY(i) = E(i)
ENDDO
DO I=1,100
flux.iflux(1) = flux.iflux(1) +
& (I**0.5)*ALOG(J)*COS(I*3.14159)
ENDDO
C CALL GETFLUX(FLUX, FL(J), BM(J))
D PRINT*,'FL, BM:', L,B,Flux.Iflux(1)
DO I=1,FLUX.N
D PRINT*,FLUX.Energy(i), Flux.IFlux(i), Flux.DFlux(i)
F(i,J) = FLUX.IFlux(i)
ENDDO
ENDDO
!$OMP END DO
!$OMP END PARALLEL
The
print*,'Max Threads:',omp_get_max_threads()
statement reports (correctly?) the setting of NCPUS. Unfortunately, when viewing the CPU usage, only one CPU ever seems to run, and subsequently, the time taken to run is about the same as for a single thread (NCPUS=1). Here’s the output:
Max Threads: 1
Module Type Count Only(s) Avg.(s) Time(s) Avg.(s)
SPENVIS_TREP (U) 1 15.860973 15.860973 15.923153 15.923153
Max Threads: 2
Module Type Count Only(s) Avg.(s) Time(s) Avg.(s)
SPENVIS_TREP (U) 1 15.785954 15.785954 15.848143 15.848143
The output from the make command is:
pgf90 -i8 -fPIC -Bstatic -tp p7-64 -fast -mp -Minfo -c -o trep_sp.o trep_sp.f
PGF90-W-0093-Type conversion of expression performed (trep_sp.f: 50)
0 inform, 1 warnings, 0 severes, 0 fatal for trep_sp
trep_sp:
41, Parallel region activated
43, Parallel loop activated; static block iteration allocation
61, Barrier
Parallel region terminated
rm -f idlSpenvisTrep.linux.64.a
pgf90 -o idlSpenvisTrep.linux.64.so idlSpenvisTrep.o trep_sp.o bext.o bint.o format.o putils.o shell.o transfos.o ae8max.o ap8max.o ecp95bd.o models.o psb97bd.o trepltv.o trepstat.o up8min.o ae8min.o ap8min.o ecvbd.o pcp94bd.o trarap.o up8max.o utils.o -fast -mp -Minfo -shared -fPIC -tp p7-64
Any help getting this to distribute across the CPUs is greatfully received.
Thanks,
Hugh