Program runs well on GTX1660Ti can not run well on A100

When I compiled on my personal computer(UBUNTU20.04/NVIDIA Geforce GTX1660Ti/CUDA11.6/driver version510.85.02), it goes well. But when I copy a.out to server(CentOS7/2*NVIDIA A100/cuda11.7/driver version 515.65.10) and run, it goes wrong. Err_msg is “invalid device symbol”.

Here is my code:

module var
    use cudafor
    implicit none
    
    integer,allocatable,managed         :: a(:)

end module var

program aaa
    use cudafor
    use var
    implicit none

    integer                             :: istat
    character(len=100)                  :: err_msg

    allocate(a(10),stat=istat,errmsg=err_msg)
    if(istat.ne.0) goto 10
    
    write(*,*)"size",size(a)
    return
10 write(*,*)err_msg
end program aaa

Here is how I compile:

nvfortran -cuda asd.f90

what’s wrong with my code?

There is evidently nothing “wrong” with your code. The problem lies either in how you built the code or the environment you are running it in. Moving a linux executable from one machine to another may have many challenges that are not directly associated with CUDA or GPUs.

See here for how nvfortran builds codes to target GPUs. Building on the system with GTX 1660 Ti means you built an executable for that target. Try using the -gpu option to also target the A100.

Thank you very much! I finally made it using

nvfortran -cuda -gpu=cc80,cuda11.7 asd.f90

There is a complex program that have the same problem. This program was actually extracted from a complex program in order to reproduce the problem. Although the method above can solve the problem of this small program, it cannot solve the problem of the original program.

Here is the code:

! managedvariable.f90
module ManagedVariable
    use ElementLibCudaFor
    use ErrorM
    implicit none

    type(Mater),allocatable,save,managed                            :: maters(:)
    type(Section),allocatable,save,managed                          :: sections(:)
    type(Element),allocatable,save,managed                          :: elems(:)
    type(InterInfoType),allocatable,save,managed                    :: interInfo(:)
    type(DofInfoType),allocatable,save,managed                      :: dofInfo(:)
    type(FeaParameter),allocatable,save,managed                     :: feaPara(:)
    type(AnalysisInfo),allocatable,save,managed                     :: analyInfo(:)
    type(LoadStep),allocatable,save,managed                         :: loadSteps(:)
    type(NodeLoadCase),allocatable,save,managed                     :: loadCases(:)
    type(NodeResult),allocatable,save,managed                       :: resBack(:)

	...
end module ManagedVariable

! exhcange.f90
subroutine cudaFortran_mp_exchangeParaEMSN(sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem)
	use CpuVariable
	use ManagedVariable
	use cstl
	use cudafor
	implicit none
	integer,intent(in)			:: sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem
	integer						:: istat
	integer                     :: countDevice,optimumDevice,actualDevice
    	integer                     :: ierr
    	type(cudaDeviceProp)        :: prop
    	character(15)               :: char
    	character(len=100)          :: string
	character(len=100)			:: err_msg
	! initialization cuda device
	countDevice=-1;optimumDevice=0
	! call nvmlGetUtilizationRates(countDevice,optimumDevice)
	ierr=cudaSetDevice(optimumDevice)
	ierr=cudaGetDevice(actualDevice)
    	ierr=cudaGetDeviceProperties ( prop, actualDevice )
    	write(char,*) actualDevice
   	 write(string,fmt='(a,a)') "DeviceSelect:    ",trim(adjustl(char))//"/"//trim(adjustl(prop%name))
    	call messagePrint (string)

	allocate(elems(sizeElem)stat=iStat,errmsg=err_msg)
	if(iStat.ne.0) write(*,*)err_msg

	...
	return
10  call errorInfoPrint("exchangeParaEMSN error")
end subroutine cudaFortran_mp_exchangeParaEMSN

! Explicit.f90
module cudaFortran
    use ElementLib
    interface
        module subroutine exchangeParaEMSN(sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem)
            integer,intent(in)                          ::sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem
        end subroutine exchangeParaEMSN
    end interface
end module cudaFortran

subroutine cudaFortranPrepare(feaFrame,veloHalf,lastPvect,lastReact)
    implicit none
    type(FeaFrameType),intent(inout)            :: feaFrame
    real(8),intent(in)                          :: veloHalf(:),lastPvect(:),lastReact(:)
    integer                                     :: sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem,i

    associate(elems=>feaFrame%feaData%elems,nodes=>feaFrame%feaData%nodes,maters=>feaFrame%feaData%maters,sections=>feaFrame%feaData%sections)
        sizeElem=size(elems)
        sizeMater=size(maters)
        sizeSection=size(sections)
        sizeNode=size(nodes)
        sizeConnectElem=size(feaFrame%feaData%connectElems)+size(feaFrame%feaData%supportElems)
        sizeContactCouple=size(feaFrame%feaData%contactCps)
        !        combiElem
        sizeCombiElem=0
        do i=1,sizeElem
            if(allocated(elems(i)%elemObjs).and.size(elems(i)%elemObjs).gt.0)then
                sizeCombiElem=sizeCombiElem+size(elems(i)%elemObjs)
            end if
        end do
        call exchangeParaEMSN(sizeElem,sizeMater,sizeSection,sizeNode,sizeConnectElem,sizeContactCouple,sizeCombiElem)

    end associate

    return
end subroutine cudaFortranPrepare

exchange.f90 and managedvariable.f90 was compiled using:

nvfortran -cuda -c++libs -fast -fPIC -gpu=cc80,cuda11.7 -c XXX.f90
nvfortran -cuda -c++libs -fast -fPIC -gpu=cc80,cuda11.7 -shared -o libcudaforrtan.so managedvariable.o exchange.o 

Explicit.f90 was compiled using ifort:

ifort Explicit.f90 -L./ -lcudafortran

What should I do to make it go well?Can you give some advice?

You mean that complex program works correctly when you compile and run it on your system with GTX 1660Ti ?

Yes, it works correctly on my system with GTX1660Ti. Of course I compile it using -gpu=cc75,cuda11.7 on my system.

Is it possible to compile/build it in the A100 environment, rather than trying to move the executable from one environment to another?