Can I Write CUDA Fortran Code in a Fortran File(.F90)

I know there is an option called “-Mcuda” when compiling, which can be used to compile CUDA Fortran code with other extensions except for .cuf or .CUF.

Does it mean that I can write CUDA Fortran code in a file with extension of “F90” and even write CUDA Fortran code together with normal Fortran code in a “F90”?

Thanks!

Does it mean that I can write CUDA Fortran code in a file with extension of “F90” and even write CUDA Fortran code together with normal Fortran code in a “F90”?

Yes. Using a CUF extension just implies CUDA Fortran is being used. When using F90 or other extensions, you just need to explicitly add the -Mcuda flag.

  • Mat

Thank you very much!
I encountered a new problem when running. It seems that GPU memory isn’t enough because I run it in a big model(called CESM) and the input data is big, too. I can’t understand the error message exactly.
Are there any improper parts in my code?

My CUDA Fortran codes are as follows:

module mix_sub1
use cudafor
contains
attributes(global) subroutine mmul_kernel( ny_block,nx_block,FX,n,CX,SF_SUBM_X,TZ,ieast,ktp,kbt,k,bid,iwest,kp1,nt,km)
        integer, value :: ny_block,nx_block,n,ieast,iwest,ktp,kbt,k,bid,kp1,nt,km
       real(selected_real_kind(12)),device :: FX(nx_block,ny_block,nt),SF_SUBM_X(nx_block,ny_block,2,ktp,k,bid),         &
       TZ(nx_block,ny_block,km,nt,bid),CX(nx_block,ny_block)
       
       
       integer :: i, j,   tx, ty, jp1, ip1,ki,kj
       tx = threadidx%x
       ty = threadidx%y
       i = (blockidx%x-1) * blockdim%x + tx
       j = (blockidx%y-1) * blockdim%y + ty

       do kj = 1, ny_block, 16
          do ki = 1, nx_block-1
            FX(ki,kj,n) = CX(ki,kj)                          &
               * ( SF_SUBM_X(ki  ,kj,ieast,ktp,k,bid) * TZ(ki,kj,k,n,bid)                        &
                 + SF_SUBM_X(ki  ,kj,ieast,kbt,k,bid) * TZ(ki,kj,kp1,n,bid)                    &
                 + SF_SUBM_X(ki+1,kj,iwest,ktp,k,bid) * TZ(ki+1,kj,k,n,bid)                    &
                 + SF_SUBM_X(ki+1,kj,iwest,kbt,k,bid) * TZ(ki+1,kj,kp1,n,bid) )
          enddo
        enddo
        call syncthreads()
end subroutine mmul_kernel

!===============================================================================
subroutine mmul(ny_block,nx_block,FX,n,CX,SF_SUBM_X,TZ,ieast,ktp,kbt,k,bid,iwest,kp1,nt,km)
       real(selected_real_kind(12)), dimension(:,:,:,:,:,:) :: SF_SUBM_X
       real(selected_real_kind(12)), dimension(:,:,:,:,:) :: TZ
       real(selected_real_kind(12)), dimension(:,:,:) :: FX
       real(selected_real_kind(12)), dimension(:,:) :: CX
       integer :: ny_block,nx_block,n,ieast,iwest,ktp,kbt,k,bid,kp1,nt,km


       real(selected_real_kind(12)), device, allocatable, dimension(:,:,:,:,:,:) :: SF_SUBM_Xdev
       real(selected_real_kind(12)), device, allocatable, dimension(:,:,:,:,:) :: TZdev
       real(selected_real_kind(12)), device, allocatable, dimension(:,:,:) :: FXdev
       real(selected_real_kind(12)), device, allocatable, dimension(:,:) :: CXdev

       type(dim3) :: dimGrid, dimBlock
       integer :: r
       real ctimeall, ctimekernel, flops, mflopskernel, mflopsall
       integer c1, c2, c3, c4

       call system_clock( count=c1 )
       allocate( FXdev(nx_block,ny_block,nt),SF_SUBM_Xdev(nx_block,ny_block,2,ktp,k,bid),TZdev(nx_block,ny_block,km,nt,bid),CXdev(nx_block,ny_block))

       SF_SUBM_Xdev = SF_SUBM_X(1:nx_block,1:ny_block,1:2,1:ktp,1:k,1:bid)
       TZdev = TZ(1:nx_block,1:ny_block,1:km,1:nt,1:bid)
       
       CXdev = CX(1:nx_block,1:ny_block)
       
       dimGrid = dim3( ny_block/16, (nx_block-1)/16, 1 )
       dimBlock = dim3( 16, 16, 1 )
       call system_clock( count=c2 )
       call mmul_kernel<<<dimGrid,dimBlock>>>(ny_block,nx_block,FXdev,n,CXdev,SF_SUBM_Xdev,TZdev,ieast,ktp,kbt,k,bid,iwest,kp1,nt,km)
       r = cudathreadsynchronize()
       call system_clock( count=c3 )

       FX(1:nx_block,1:ny_block,1:nt) = FXdev

       call system_clock( count=c4 )
       flops = float(ncol) * float(km)
       ctimekernel = c3 - c2
       mflopskernel = flops / ctimekernel
       ctimeall = c4 - c1
       mflopsall = flops / ctimeall
       print *, 'Kernel time excluding data xfer:', ctimekernel, ' microseconds'
       print *, 'Megaflops excluding data xfer:  ', mflopskernel
       print *, 'Total time including data xfer: ', ctimeall, ' microseconds' 
       print *, 'Megaflops including data xfer:  ', mflopsall
       deallocate(FXdev,SF_SUBM_Xdev,TZdev,CXdev)
    end subroutine mmul
end module mix_sub1

And here come the error messages


(seq_comm_setcomm) initialize ID ( 1 GLOBAL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 2 CPL ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 17 ATM ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 18 CPLATM ) join IDs = 2 17 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 3 ALLATMID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 10 CPLALLATMID ) join IDs = 2 3 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 19 LND ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 20 CPLLND ) join IDs = 2 19 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 4 ALLLNDID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 11 CPLALLLNDID ) join IDs = 2 4 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 21 OCN ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 22 CPLOCN ) join IDs = 2 21 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 5 ALLOCNID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 12 CPLALLOCNID ) join IDs = 2 5 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 23 ICE ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 24 CPLICE ) join IDs = 2 23 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 6 ALLICEID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 13 CPLALLICEID ) join IDs = 2 6 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 25 GLC ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 26 CPLGLC ) join IDs = 2 25 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 7 ALLGLCID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 14 CPLALLGLCID ) join IDs = 2 7 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 27 ROF ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 28 CPLROF ) join IDs = 2 27 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 8 ALLROFID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 15 CPLALLROFID ) join IDs = 2 8 ( npes = 16) ( nthreads = 1)
(seq_comm_setcomm) initialize ID ( 29 WAV ) pelist = 0 15 1 ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 30 CPLWAV ) join IDs = 2 29 ( npes = 16) ( nthreads = 1)
(seq_comm_jcommarr) initialize ID ( 9 ALLWAVID ) join multiple comp IDs ( npes = 16) ( nthreads = 1)
(seq_comm_joincomm) initialize ID ( 16 CPLALLWAVID ) join IDs = 2 9 ( npes = 16) ( nthreads = 1)
(seq_comm_printcomms) 1 0 16 1 GLOBAL:
(seq_comm_printcomms) 2 0 16 1 CPL:
(seq_comm_printcomms) 3 0 16 1 ALLATMID:
(seq_comm_printcomms) 4 0 16 1 ALLLNDID:
(seq_comm_printcomms) 5 0 16 1 ALLOCNID:
(seq_comm_printcomms) 6 0 16 1 ALLICEID:
(seq_comm_printcomms) 7 0 16 1 ALLGLCID:
(seq_comm_printcomms) 8 0 16 1 ALLROFID:
(seq_comm_printcomms) 9 0 16 1 ALLWAVID:
(seq_comm_printcomms) 10 0 16 1 CPLALLATMID:
(seq_comm_printcomms) 11 0 16 1 CPLALLLNDID:
(seq_comm_printcomms) 12 0 16 1 CPLALLOCNID:
(seq_comm_printcomms) 13 0 16 1 CPLALLICEID:
(seq_comm_printcomms) 14 0 16 1 CPLALLGLCID:
(seq_comm_printcomms) 15 0 16 1 CPLALLROFID:
(seq_comm_printcomms) 16 0 16 1 CPLALLWAVID:
(seq_comm_printcomms) 17 0 16 1 ATM:
(seq_comm_printcomms) 18 0 16 1 CPLATM:
(seq_comm_printcomms) 19 0 16 1 LND:
(seq_comm_printcomms) 20 0 16 1 CPLLND:
(seq_comm_printcomms) 21 0 16 1 OCN:
(seq_comm_printcomms) 22 0 16 1 CPLOCN:
(seq_comm_printcomms) 23 0 16 1 ICE:
(seq_comm_printcomms) 24 0 16 1 CPLICE:
(seq_comm_printcomms) 25 0 16 1 GLC:
(seq_comm_printcomms) 26 0 16 1 CPLGLC:
(seq_comm_printcomms) 27 0 16 1 ROF:
(seq_comm_printcomms) 28 0 16 1 CPLROF:
(seq_comm_printcomms) 29 0 16 1 WAV:
(seq_comm_printcomms) 30 0 16 1 CPLWAV:
(t_initf) Read in prof_inparm namelist from: drv_in
8 MB memory alloc in MB is 8.00seq_flds_mod: read seq_cplflds_inparm namelist from: drv_in

8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1020.02
seq_flds_mod: read seq_cplflds_userspec namelist from: drv_in
8 MB memory alloc in MB is 8.00
8 MB memory alloc in MB is 8.008 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1020.02

8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1020.51
8 MB memory alloc in MB is 8.00
8 MB memory alloc in MB is 8.008 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1019.52
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1018.53
8 MB memory alloc in MB is 8.008 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1051.20
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1018.03
8 MB memory alloc in MB is 8.008 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1018.53
8 MB memory alloc in MB is 8.00

8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1018.03
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1019.52
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1019.02

8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1017.54
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1049.63

8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1019.02
8 MB memory alloc in MB is 8.00
8 MB memory dealloc in MB is 0.00
Memory block size conversion in bytes is 1019.02
seq_flds_mod: seq_flds_a2x_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:Sa_pslv
seq_flds_mod: seq_flds_a2x_fluxes=
Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:
Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_swnet:Faxa_bcphidry:
Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:
Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:
Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4
seq_flds_mod: seq_flds_x2a_states=
Sf_lfrac:Sf_ifrac:Sf_ofrac:Sx_avsdr:Sx_anidr:Sx_avsdf:Sx_anidf:Sx_tref:Sx_qref:So_t:Sx_t:Sl_fv:Sl_ram1:Sl_snowh:Si_snowh:So_ssq:So_re:Sx_u10:So_ustar
seq_flds_mod: seq_flds_x2a_fluxes=
Faxx_taux:Faxx_tauy:Faxx_lat:Faxx_sen:Faxx_lwup:Faxx_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4
seq_flds_mod: seq_flds_l2x_states=
Sl_avsdr:Sl_anidr:Sl_avsdf:Sl_anidf:Sl_tref:Sl_qref:Sl_t:Sl_fv:Sl_ram1:Sl_snowh:Sl_u10
seq_flds_mod: seq_flds_l2x_fluxes=
Fall_swnet:Fall_taux:Fall_tauy:Fall_lat:Fall_sen:Fall_lwup:Fall_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4:Flrl_rofliq:Flrl_rofice
seq_flds_mod: seq_flds_x2l_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Slrr_volr
seq_flds_mod: seq_flds_x2l_fluxes=
Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:
Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:
Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:
Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:
Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Flrr_flood
seq_flds_mod: seq_flds_i2x_states=
Si_avsdr:Si_anidr:Si_avsdf:Si_anidf:Si_tref:Si_qref:Si_t:Si_snowh:Si_u10:Si_ifrac
seq_flds_mod: seq_flds_i2x_fluxes=
Faii_swnet:Fioi_swpen:Faii_taux:Fioi_taux:Faii_tauy:Fioi_tauy:Faii_lat:Faii_sen:Faii_lwup:Faii_evap:Fioi_melth:Fioi_meltw:Fioi_salt
seq_flds_mod: seq_flds_x2i_states=
Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:So_t:So_s:So_u:So_v:So_dhdx:So_dhdy
seq_flds_mod: seq_flds_x2i_fluxes=
Faxa_rain:Faxa_snow:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:
Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:
Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:
Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Fioo_q
seq_flds_mod: seq_flds_o2x_states=
So_t:So_s:So_u:So_v:So_dhdx:So_dhdy:So_bldepth
seq_flds_mod: seq_flds_o2x_fluxes=
Fioo_q
seq_flds_mod: seq_flds_x2o_states=
Sa_pslv:So_duu10n:Si_ifrac:Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes
seq_flds_mod: seq_flds_x2o_fluxes=
Faxa_rain:Faxa_snow:Faxa_prec:Faxa_lwdn:Foxx_swnet:Faxa_bcphidry:
Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:
Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:
Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Foxx_taux:Foxx_tauy:Foxx_lat:
Foxx_sen:Foxx_lwup:Foxx_evap:Fioi_melth:Fioi_meltw:Fioi_salt:Forr_roff:Forr_ioff
seq_flds_mod: seq_flds_s2x_states=

seq_flds_mod: seq_flds_s2x_fluxes=

seq_flds_mod: seq_flds_x2s_states=

seq_flds_mod: seq_flds_x2s_fluxes=

seq_flds_mod: seq_flds_g2x_states=

seq_flds_mod: seq_flds_g2x_fluxes=

seq_flds_mod: seq_flds_x2g_states=

seq_flds_mod: seq_flds_x2g_fluxes=

seq_flds_mod: seq_flds_xao_states=
So_tref:So_qref:So_ssq:So_re:So_u10:So_duu10n:So_ustar
seq_flds_mod: seq_flds_xao_albedo=
So_avsdr:So_anidr:So_avsdf:So_anidf
seq_flds_mod: seq_flds_r2x_states=
Slrr_volr
seq_flds_mod: seq_flds_r2x_fluxes=
Forr_roff:Forr_ioff:Flrr_flood
seq_flds_mod: seq_flds_x2r_states=

seq_flds_mod: seq_flds_x2r_fluxes=
Flrl_rofliq:Flrl_rofice
seq_flds_mod: seq_flds_w2x_states=
Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokes
seq_flds_mod: seq_flds_w2x_fluxes=

seq_flds_mod: seq_flds_x2w_states=
Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepth
seq_flds_mod: seq_flds_x2w_fluxes=

init_overflows_kmt: KMT = 34 at global (i,j) = 60 4 changed to 33 init_overflows_kmt: KMT = 36 at global (i,j) = 98 4 changed to 35
init_overflows_kmt: KMT = 36 at global (i,j) = 99 4 changed to 35
init_overflows_kmt: KMT = 36 at global (i,j) = 100 4 changed to 35

init_overflows_kmt: KMT = 34 at global (i,j) = 61 4 changed to 33
init_overflows_kmt: KMT = 34 at global (i,j) = 62 4 changed to 33
init_overflows_kmt: KMT = 38 at global (i,j) = 12 93 changed to 37
init_overflows_kmt: KMT = 38 at global (i,j) = 12 94 changed to 37
init_overflows_kmt: KMT = 38 at global (i,j) = 12 95 changed to 37
init_overflows_kmt: KMT = 33 at global (i,j) = 7 109 changed to 32
init_overflows_kmt: KMT = 33 at global (i,j) = 7 110 changed to 32
init_overflows_kmt: KMT = 33 at global (i,j) = 7 111 changed to 32
Overflow: Ross Sea Source region mask at global (ij)= 59 2 Overflow: Weddell Sea Source region mask at global (ij)= 98 2
Overflow: Weddell Sea Source region mask at global (ij)= 99 2
Overflow: Weddell Sea Source region mask at global (ij)=100 2
Overflow: Weddell Sea Source region mask at global (ij)= 98 3
Overflow: Weddell Sea Source region mask at global (ij)= 99 3
Overflow: Weddell Sea Source region mask at global (ij)=100 3
Overflow: Weddell Sea Source adjacent mask at global (ij)= 98 3
Overflow: Weddell Sea Source adjacent mask at global (ij)= 99 3
Overflow: Weddell Sea Source adjacent mask at global (ij)=100 3
Overflow: Weddell Sea Inflow region mask at global (ij)= 98 5
Overflow: Weddell Sea Inflow region mask at global (ij)= 99 5
Overflow: Weddell Sea Inflow region mask at global (ij)=100 5
Overflow: Weddell Sea Inflow region mask at global (ij)= 98 6
Overflow: Weddell Sea Inflow region mask at global (ij)= 99 6
Overflow: Weddell Sea Inflow region mask at global (ij)=100 6
Overflow: Weddell Sea Entrainment region mask at global (ij)= 96 6
Overflow: Weddell Sea Entrainment region mask at global (ij)= 97 6
Overflow: Weddell Sea Entrainment region mask at global (ij)= 98 6
Overflow: Weddell Sea Entrainment adjacent mask at global (ij)= 96 6
Overflow: Weddell Sea Entrainment adjacent mask at global (ij)= 97 6
Overflow: Weddell Sea Entrainment adjacent mask at global (ij)= 98 6
Overflow: Weddell Sea Inflow region mask at global (ij)= 98 7
Overflow: Weddell Sea Inflow region mask at global (ij)= 99 7
Overflow: Weddell Sea Inflow region mask at global (ij)=100 7
Overflow: Weddell Sea Entrainment region mask at global (ij)= 96 7
Overflow: Weddell Sea Entrainment region mask at global (ij)= 97 7
Overflow: Weddell Sea Entrainment region mask at global (ij)= 98 7
Overflow: Weddell Sea Entrainment region mask at global (ij)= 96 8
Overflow: Weddell Sea Entrainment region mask at global (ij)= 97 8
Overflow: Weddell Sea Entrainment region mask at global (ij)= 98 8
Overflow: Weddell Sea Product adjacent mask at global (ij)= 96 7
Overflow: Weddell Sea Product adjacent mask at global (ij)= 97 7
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 7
Overflow: Weddell Sea Product adjacent mask at global (ij)= 96 8
Overflow: Weddell Sea Product adjacent mask at global (ij)= 97 8
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 8
Overflow: Weddell Sea Product adjacent mask at global (ij)= 96 9
Overflow: Weddell Sea Product adjacent mask at global (ij)= 97 9
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 9
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 9
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 10
Overflow: Weddell Sea Product adjacent mask at global (ij)= 98 11

Overflow: Ross Sea Source region mask at global (ij)= 60 2
Overflow: Ross Sea Source region mask at global (ij)= 61 2
Overflow: Ross Sea Source region mask at global (ij)= 62 2
Overflow: Ross Sea Source region mask at global (ij)= 59 3
Overflow: Ross Sea Source region mask at global (ij)= 60 3
Overflow: Ross Sea Source region mask at global (ij)= 61 3
Overflow: Ross Sea Source region mask at global (ij)= 62 3
Overflow: Ross Sea Source adjacent mask at global (ij)= 60 3
Overflow: Ross Sea Source adjacent mask at global (ij)= 61 3
Overflow: Ross Sea Source adjacent mask at global (ij)= 62 3
Overflow: Ross Sea Inflow region mask at global (ij)= 60 5
Overflow: Ross Sea Inflow region mask at global (ij)= 61 5
Overflow: Ross Sea Inflow region mask at global (ij)= 62 5
Overflow: Ross Sea Inflow region mask at global (ij)= 60 6
Overflow: Ross Sea Inflow region mask at global (ij)= 61 6
Overflow: Ross Sea Inflow region mask at global (ij)= 62 6
Overflow: Ross Sea Entrainment region mask at global (ij)= 59 6
Overflow: Ross Sea Entrainment region mask at global (ij)= 60 6
Overflow: Ross Sea Entrainment region mask at global (ij)= 61 6
Overflow: Ross Sea Entrainment adjacent mask at global (ij)= 59 6
Overflow: Ross Sea Entrainment adjacent mask at global (ij)= 60 6
Overflow: Ross Sea Entrainment adjacent mask at global (ij)= 61 6
Overflow: Ross Sea Inflow region mask at global (ij)= 60 7
Overflow: Ross Sea Inflow region mask at global (ij)= 61 7
Overflow: Ross Sea Inflow region mask at global (ij)= 62 7
Overflow: Ross Sea Entrainment region mask at global (ij)= 59 7
Overflow: Ross Sea Entrainment region mask at global (ij)= 60 7
Overflow: Ross Sea Entrainment region mask at global (ij)= 61 7
Overflow: Ross Sea Entrainment region mask at global (ij)= 59 8
Overflow: Ross Sea Entrainment region mask at global (ij)= 60 8
Overflow: Ross Sea Entrainment region mask at global (ij)= 61 8
Overflow: Ross Sea Product adjacent mask at global (ij)= 59 7
Overflow: Ross Sea Product adjacent mask at global (ij)= 60 7
Overflow: Ross Sea Product adjacent mask at global (ij)= 61 7
Overflow: Ross Sea Product adjacent mask at global (ij)= 59 8
Overflow: Ross Sea Product adjacent mask at global (ij)= 60 8
Overflow: Ross Sea Product adjacent mask at global (ij)= 61 8
Overflow: Ross Sea Product adjacent mask at global (ij)= 59 9
Overflow: Ross Sea Product adjacent mask at global (ij)= 60 9
Overflow: Ross Sea Product adjacent mask at global (ij)= 61 9
Overflow: Ross Sea Product adjacent mask at global (ij)= 59 10
Overflow: Ross Sea Product adjacent mask at global (ij)= 60 10
Overflow: Ross Sea Product adjacent mask at global (ij)= 61 10
Overflow: Denmark Strait Inflow region mask at global (ij)= 3 104
Overflow: Denmark Strait Inflow region mask at global (ij)= 4 104
Overflow: Denmark Strait Inflow region mask at global (ij)= 5 104
Overflow: Denmark Strait Entrainment region mask at global (ij)= 3 104
Overflow: Denmark Strait Entrainment region mask at global (ij)= 4 104
Overflow: Denmark Strait Entrainment region mask at global (ij)= 5 104
Overflow: Denmark Strait Inflow region mask at global (ij)= 3 105
Overflow: Denmark Strait Inflow region mask at global (ij)= 4 105
Overflow: Denmark Strait Inflow region mask at global (ij)= 5 105
Overflow: Denmark Strait Entrainment region mask at global (ij)= 3 105
Overflow: Denmark Strait Entrainment region mask at global (ij)= 4 105
Overflow: Denmark Strait Entrainment region mask at global (ij)= 5 105
Overflow: Denmark Strait Entrainment adjacent mask at global (ij)= 5 105
Overflow: Denmark Strait Inflow region mask at global (ij)= 3 106
Overflow: Denmark Strait Inflow region mask at global (ij)= 4 106
Overflow: Denmark Strait Inflow region mask at global (ij)= 5 106
Overflow: Denmark Strait Entrainment region mask at global (ij)= 3 106
Overflow: Denmark Strait Entrainment region mask at global (ij)= 4 106
Overflow: Denmark Strait Entrainment region mask at global (ij)= 5 106
Overflow: Denmark Strait Entrainment adjacent mask at global (ij)= 5 106
Overflow: Denmark Strait Inflow region mask at global (ij)= 3 107
Overflow: Denmark Strait Inflow region mask at global (ij)= 4 107
Overflow: Denmark Strait Inflow region mask at global (ij)= 5 107
Overflow: Denmark Strait Entrainment region mask at global (ij)= 3 107
Overflow: Denmark Strait Entrainment region mask at global (ij)= 4 107
Overflow: Denmark Strait Entrainment region mask at global (ij)= 5 107
Overflow: Denmark Strait Entrainment adjacent mask at global (ij)= 5 107
Overflow: Denmark Strait Source region mask at global (ij)= 8 109
Overflow: Denmark Strait Source region mask at global (ij)= 9 109
Overflow: Denmark Strait Source region mask at global (ij)= 10 109
Overflow: Denmark Strait Source adjacent mask at global (ij)= 8 109
Overflow: Denmark Strait Source region mask at global (ij)= 8 110
Overflow: Denmark Strait Source region mask at global (ij)= 9 110
Overflow: Denmark Strait Source region mask at global (ij)= 10 110
Overflow: Denmark Strait Source adjacent mask at global (ij)= 8 110
Overflow: Denmark Strait Source region mask at global (ij)= 8 111
Overflow: Denmark Strait Source region mask at global (ij)= 9 111
Overflow: Denmark Strait Source region mask at global (ij)= 10 111
Overflow: Denmark Strait Source adjacent mask at global (ij)= 8 111
Overflow: Denmark Strait Source region mask at global (ij)= 8 112
Overflow: Denmark Strait Source region mask at global (ij)= 9 112
Overflow: Denmark Strait Source region mask at global (ij)= 10 112
Overflow: Denmark Strait Source adjacent mask at global (ij)= 8 112
Overflow: Denmark Strait Product adjacent mask at global (ij)= 4 102
Overflow: Denmark Strait Product adjacent mask at global (ij)= 4 103
Overflow: Denmark Strait Product adjacent mask at global (ij)= 4 104
Overflow: Denmark Strait Product adjacent mask at global (ij)= 3 102
Overflow: Denmark Strait Product adjacent mask at global (ij)= 3 103
Overflow: Denmark Strait Product adjacent mask at global (ij)= 3 104
Overflow: Denmark Strait Product adjacent mask at global (ij)= 2 100
Overflow: Denmark Strait Product adjacent mask at global (ij)= 2 101
Overflow: Denmark Strait Product adjacent mask at global (ij)= 2 102
Overflow: Denmark Strait Product adjacent mask at global (ij)= 1 95
Overflow: Denmark Strait Product adjacent mask at global (ij)= 1 96
Overflow: Denmark Strait Product adjacent mask at global (ij)= 1 97
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 7 93
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 8 93
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 9 93
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 10 93
Overflow: Faroe Bank Channel Source region mask at global (ij)= 13 93
Overflow: Faroe Bank Channel Source region mask at global (ij)= 14 93
Overflow: Faroe Bank Channel Source region mask at global (ij)= 15 93
Overflow: Faroe Bank Channel Source adjacent mask at global (ij)= 13 93
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 7 93
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 8 93
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 9 93
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 10 93
Overflow: Faroe Bank Channel Entrainment adjacent mask at global (ij)= 10 93
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 7 94
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 8 94
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 9 94
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 10 94
Overflow: Faroe Bank Channel Source region mask at global (ij)= 13 94
Overflow: Faroe Bank Channel Source region mask at global (ij)= 14 94
Overflow: Faroe Bank Channel Source region mask at global (ij)= 15 94
Overflow: Faroe Bank Channel Source adjacent mask at global (ij)= 13 94
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 7 94
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 8 94
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 9 94
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 10 94
Overflow: Faroe Bank Channel Entrainment adjacent mask at global (ij)= 10 94
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 7 95
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 8 95
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 9 95
Overflow: Faroe Bank Channel Inflow region mask at global (ij)= 10 95
Overflow: Faroe Bank Channel Source region mask at global (ij)= 13 95
Overflow: Faroe Bank Channel Source region mask at global (ij)= 14 95
Overflow: Faroe Bank Channel Source region mask at global (ij)= 15 95
Overflow: Faroe Bank Channel Source adjacent mask at global (ij)= 13 95
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 7 95
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 8 95
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 9 95
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 10 95
Overflow: Faroe Bank Channel Entrainment adjacent mask at global (ij)= 10 95
Overflow: Faroe Bank Channel Source region mask at global (ij)= 13 96
Overflow: Faroe Bank Channel Source region mask at global (ij)= 14 96
Overflow: Faroe Bank Channel Source region mask at global (ij)= 15 96
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 7 96
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 8 96
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 9 96
Overflow: Faroe Bank Channel Entrainment region mask at global (ij)= 10 96
Overflow: Faroe Bank Channel Source region mask at global (ij)= 13 97
Overflow: Faroe Bank Channel Source region mask at global (ij)= 14 97
Overflow: Faroe Bank Channel Source region mask at global (ij)= 15 97
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 9 93
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 9 94
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 9 95
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 8 93
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 8 94
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 8 95
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 7 93
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 7 94
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 7 95
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 5 90
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 5 91
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 5 92
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 3 89
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 4 89
Overflow: Faroe Bank Channel Product adjacent mask at global (ij)= 5 89
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)
0: ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 21503 RUNNING AT localhost.localdomain
= EXIT CODE: 127
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

ALLOCATE: 22968 bytes requested; status = 35(CUDA driver version is insufficient for CUDA runtime version)

I don’t think it’s failing because you’re running of memory, rather it’s failing because your CUDA driver is an older version.

What driver version do you have? (If you don’t know, run the PGI “pgaccelinfo” utility. The driver version is printed toward the beginning of the output).

Gets the link to get the latest driver, http://www.nvidia.com/Download/index.aspx

  • Mat

Hi, even I am facing the same error when running CESM.

I use pgi/17.4 version.

I am using -Mcuda=kepler+ -cudalibs flags.

My CUDA runtime version is CUDA driver 384.0.

Any reason why my code is failing??

Hi Aketh,

This error should only occur when running an older driver with a newer CUDA runtime version and since PGI 17,.4 defaulted to using CUDA 7.5 and your driver is CUDA 9.0, I’m not sure why you’re seeing it.

Can you post the error as well as the output from “nvidia-smi” from the system you’re running on?

Note that I just tried running a program on one of our K40 systems with 384.81 installed using PGI 17.4, and it ran without issue. This is why I suspect something else is going on.

Thanks,
Mat