Problems with linking __pgi_uacc_set_cuda

Hello

I am having trouble with compiling on Windows (using PGICE-184) my code on multicore CPU using the following flags:

pgfortran -o GENA213.exe GENA213.cuf -fast -Minfo=opt -ta=multicore -Minfo=accel

however, it works fine with:

pgfortran -o GENA213.exe GENA213.cuf -fast -Minfo=opt -ta:tesla:cc50 -Minfo=accel

The error I am getting is:

pgcudafat2e30PcLlYdTIMe.o : error LNK2001: unresolved external symbol __pgi_uacc_set_cuda
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_1881 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_1994 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2082 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2133 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2206 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2274 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2360 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2449 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2526 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2598 referenced in function MAIN_
pgfortran9hriwcZq6GkqWU.obj : error LNK2019: unresolved external symbol .LB1_2665 referenced in function MAIN_
libaccnc.lib(cuda_init_c.obj) : error LNK2019: unresolved external symbol __pgi_uacc_cuda_initdinfoflags referenced in function __pgi_uacc_cuda_init_framework
libaccnc.lib(cuda_init_c.obj) : error LNK2019: unresolved external symbol __pgi_uacc_cuda_release_buffer referenced in function __pgi_uacc_cuda_init
libaccnc.lib(cuda_init_c.obj) : error LNK2019: unresolved external symbol __pgi_uacc_cuda_stream referenced in function __pgi_uacc_cuda_initdev
libaccnc.lib(cuda_launch_k.obj) : error LNK2001: unresolved external symbol __pgi_uacc_cuda_stream
libaccnc.lib(cuda_launch.obj) : error LNK2001: unresolved external symbol __pgi_uacc_cuda_stream
libaccnc.lib(cuda_launch_k.obj) : error LNK2019: unresolved external symbol __pgi_uacc_cuda_argmem referenced in function __pgi_uacc_cuda_launchk3
libaccnc.lib(cuda_launch.obj) : error LNK2001: unresolved external symbol __pgi_uacc_cuda_argmem
GENA213.exe : fatal error LNK1120: 16 unresolved externals
./GENA213.exf: error STP001: cannot open file

Could you kindly explain what I need changed or added to my code for it to compile?
I have tried adding in my main code the following, but it won’t work:

        
        PROGRAM GENA213
        use cudafor
        use Openacc
        use omp_lib

and I use directives in my code that look like this (although not all at once):

        	!$acc data copyin(twobdmat,kbeam) copyout(multtwobdmat) 
...
        	!$acc parallel loop
...
				    !$acc loop
...
	        !$acc loop reduction(+:rsum)
...
        	!$acc end data

Thank you, I am grateful for your time.
Ahmed

Hi Ahmed,

The actual error is because when you use “-ta=multicore” a different set of libraries are linked in that don’t CUDA enabled. Hence the CUDA Fortran parts of the code are missing some references. You can work around this by using “-ta=tesla,multicore” so the compiler creates a unified binary containing both GPU and Multicore enabled OpenACC code. You’d then set the environment flag “ACC_DEVICE_TYPE=HOST” to have the OpenACC portion run on the multicore CPU.

I want to make sure that you understand that the CUDA Fortran portion of the code wont run on the Multicore CPU. And if you’re mixing data or compute between the CUDA Fortran and OpenACC portions of the code, you may have issues.

-Mat

Hello Mat,
Thank you for your reply. It managed to compile with “-ta=tesla,multicore”. If I do not link CUDA-enabled libraries, such as “use cudafor”…will I still have to set the environment flag “ACC_DEVICE_TYPE=HOST” to have the OpenACC portion run on the multicore CPU?

Can you also, please, mention how to set the environment flag? Is it in the code or do I change an external file?

If there is an example to direct me to, that would be great.
(btw, my code went from 787sec CPU to 483sec Multicore to 54 sec GPU on a GTX TITAN X)

Thank you for your time.
Ahmed.

Hi Ahmed,

If I do not link CUDA-enabled libraries, such as “use cudafor”…will I still have to set the environment flag “ACC_DEVICE_TYPE=HOST” to have the OpenACC portion run on the multicore CPU?

No, if you’re compiling a pure OpenACC code, then you wont have this issue. It’s only a problem here since you’re trying to combine CUDA Fortran with OpenACC.

The work around I gave you (-ta=tesla,multicore) combines both a muticore x86 version of the OpenACC code with a GPU version. At runtime, the code chooses which version of the code to use, with the default being the GPU. Setting ACC_DEVICE_TYPE=HOST, overrides this default, and instead uses multicore CPU.

If you were able to compile with just “-ta=multicore”, the only target available in the binary is multicore, so no need to set the device type.

Can you also, please, mention how to set the environment flag?

It will depend on the shell you’re using.

Under Windows DOS: “set ACC_DEVICE_TYPE=HOST”,
Windows/Linux Bash “export ACC_DEVICE_TYPE=HOST”
Linux Csh “setenv ACC_DEVICE_TYPE HOST”

Is it in the code or do I change an external file?

I’m not clear on what you’re asking here. ACC_DEVICE_TYPE is an environment variable set in you shell. There is an API call you can make from your program, acc_set_device_type, if you prefer a programmatic solution.

-Mat