acc_set_device_num question

AROM · September 29, 2015, 2:28pm

Hi Mat,

In my code I have

        use openacc
        use omp_lib

.....
       integer iam,ze,ze
....

        ngpus = acc_get_num_devices(acc_device_nvidia)

        print *, "ngpus", ngpus
        call omp_set_num_threads(ngpus)
!$OMP PARALLEL shared (ngpus) private(iam,zs,ze)
        iam = omp_get_thread_num()
        call acc_set_device_num(iam,acc_device_nvidia)

        zs = (iam+0)*((NZ+ngpus-1)/ngpus)
        ze = (iam+1)*((NZ+ngpus-1)/ngpus)
        if(ze.gt.nz) ze = nx
        if(zs.eq.0) zs = 1
        if(zs.ne.1) zs = zs-1
        if(ze.ne.nz) ze = ze+1
        print *, iam, zs, ze
!$acc enter data create( psi1(:,:,zs:ze,:), psi2(:,:,zs:ze,:))
!$OMP END PARALLEL

compiler (PGI 15.4, 15.7) tells me

 pgfortran  -m64 -ta=nvidia,cc3.5,nodebug,cuda7.0,pin -mcmodel=medium test.f90 -i8 -Mlarge_arrays -O3 -mp  -acc -o test
PGF90-S-0450-Argument number 1 to acc_set_device_num: kind mismatch (test.f90: 93)

what is wrong with the code?

thanks

MatColgrove · September 29, 2015, 6:01pm

Hi AROM,

It’s expecting an “integer(kind=4)” but you’re using the “-i8” flag so “integer” gets a default kind of 8. To fix, set “iam” to kind=4.

       integer(4) iam
       integer ze,ze

Hope this helps,
Mat

AROM · September 30, 2015, 5:43am

elementary dear Watson!

Thank you Mat!

AROM · September 30, 2015, 6:15am

Hi Mat!

One more question. Here is the full testcase.

    PROGRAM VORTEX
        use openacc
        use omp_lib
        IMPLICIT NONE
        INTEGER :: NX,NY,NZ,npltx,nplty,npltz

        PARAMETER(NX=500,NY=500,NZ=500)
        REAL*8 :: EPSPH,TH,TWOPI
        REAL*8, DIMENSION(:,:,:,:), ALLOCATABLE:: psi1,psi2
        iNTEGER :: ix,iy,iz,ix1,iy1,iz1
        integer(kind=4) :: ierr, ngpus,zs,ze,iam

        allocate(PSI1(NX,NY,NZ,1:2),stat=ierr)
        if(ierr /= 0)write(*,*)"allocation error PSI"
        allocate(PSI2(NX,NY,NZ,1:2),stat=ierr)
        if(ierr /= 0)write(*,*)"allocation error PSI2"
    TWOPI = 2.0*ACOS(-1.0E00)

        ngpus = acc_get_num_devices(acc_device_nvidia)

        print *, "ngpus", ngpus
        call omp_set_num_threads(ngpus)
!$OMP PARALLEL shared (ngpus) private(iam,zs,ze)
        iam = omp_get_thread_num()

        zs = (iam+0)*((NZ+ngpus-1)/ngpus)
        ze = (iam+1)*((NZ+ngpus-1)/ngpus)
        if(ze.gt.nz) ze = nx
        if(zs.eq.0) zs = 1
        if(zs.ne.1) zs = zs-1
        if(ze.ne.nz) ze = ze+1
        print *, iam, zs, ze
!       iam = iam + 1
        call acc_set_device_num(iam,acc_device_nvidia)
!$acc enter data create( psi1(:,:,zs:ze,:), psi2(:,:,zs:ze,:))
!$OMP END PARALLEL

        return
!$acc data create(PSI1,PSI2)
        EPSPH = 0.02
        TH = TWOPI
!$acc kernels
!!$OMP PARALLEL DO
        DO IZ1=1,NZ
        DO IY1=1,NY
        DO IX1=1,NX
        PSI1(IX1,IY1,IZ1,2) = EPSPH*COS(TH)
    PSI2(IX1,IY1,IZ1,2) = EPSPH*SIN(TH)
    PSI1(IX1,IY1,IZ1,1) = EPSPH*COS(TH)
    PSI2(IX1,IY1,IZ1,1) = EPSPH*SIN(TH)
    ENDDO
    ENDDO
        ENDDO
!$acc end kernels
!$acc end data
!!$OMP END PARALLEL DO

        deallocate(PSI1,PSI2)
        STOP
    END

compilation

pgfortran  -m64 -ta=nvidia,cc3.5,nodebug,cuda7.0,pin -mcmodel=medium test1.f90 -i8 -Mlarge_arrays -O3 -mp  -acc -o test1

launching:

./test1
 ngpus            3
            0            1          168
            1          166          335
            2          333          500
FATAL ERROR: variable in data clause was already present on device 3: name=psi1
 file:/home-2/..../test1.f90 vortex line:36
psi1 lives at 0x2b58019a5020 size 2000000000 present
Present table dump for device[3]: NVIDIA Tesla GPU 3, compute capability 3.5
host:0x2b58019a5020 device:0x230d9e0000 size:2000000000 presentcount:1 line:36 name:psi1
host:0x2b58019a5020 device:0x2384d40000 size:2000000000 presentcount:1 line:36 name:psi1
call to cuMemAlloc returned error 4: Deinitialized
Failing in Thread:2

next run:

$ PGI_ACC_DEBUG=1 ./test1  2>&1 | grep devid
pgi_uacc_set_device_num(devnum=0,devtype=4,threadid=1) cuda devid=1 dindex=1
pgi_uacc_dataenterstart( file=/home-2/..../test1.f90, function=vortex, line=2:2, line=36, devid=0 )
pgi_uacc_set_device_num(devnum=2,devtype=4,threadid=3) cuda devid=3 dindex=3
pgi_uacc_set_device_num(devnum=1,devtype=4,threadid=2) cuda devid=2 dindex=2
pgi_uacc_dataenterstart( file=/home-2/..../test1.f90, function=vortex, line=2:2, line=36, devid=0 )
pgi_uacc_dataenterstart( file=/home-2/..../test1.f90, function=vortex, line=2:2, line=36, devid=0 )
pgi_uacc_alloc(size=2000000000,devid=3,threadid=2)
pgi_uacc_alloc(size=2000000000,devid=3,threadid=1)
pgi_uacc_alloc(size=2000000000,devid=3,threadid=2) returns 0x230d9e0000
pgi_uacc_alloc(size=2000000000,devid=3,threadid=1) returns 0x2384d40000
pgi_uacc_alloc(size=2000000000,devid=3,threadid=2)

I’m confused with the fact that different devices is set active in different threads. At the same time memory allocation is performed on device 3 only. Could you explain this please? Is it PGI issue or mine?

Alexey

MatColgrove · September 30, 2015, 4:15pm

Hi Alexey,

This looks like a PGI error. We had a race condition in the present table that was causing similar errors but was fixed in 15.7. This error seems similar to that one but still occurs in 15.7 and 15.9, but not in our internal development compiler. I’ll need to talk with our compiler engineers to see if they found this issue already and will add the fix to a future release.

I added TPR#21991 to track this issue.

Thanks,
Mat

Topic		Replies	Views
OpenMP, OpenACC and acc_set_device_num Legacy PGI Compilers	12	10948	March 15, 2013
cuEventCreate error when switching from 17.10 to 18.4 Legacy PGI Compilers	14	4274	November 26, 2018
cudaSetDevice failing Legacy PGI Compilers	5	7364	December 11, 2018
OpenACC usage inside OpenMP constructs Legacy PGI Compilers	6	3948	August 26, 2019
How to specificy which GPUs to run on Legacy PGI Compilers	5	7526	December 8, 2010
call to cuLaunchKernel returned error 400: Invalid handle Legacy PGI Compilers	2	4406	May 13, 2019
Unified Memory Problem nvc, nvc++ and nvfortran	12	1325	January 12, 2022
CUDA & OpenACC interoperability: Device selection Legacy PGI Compilers	1	3945	July 6, 2017
How to get the number of the used GPU device? Legacy PGI Compilers	7	10425	July 15, 2010
Fortran OpenACC Runtime Library Routines Legacy PGI Compilers	1	1655	September 4, 2018

acc_set_device_num question

Related topics