linking acml5.3.1 and pgi visual fortran 17.5. Runtime error

In order to link the acml library I took the following actions:
-Additional include directory : acml5.3.1\win64_mp\include
-Language : Enable OpenMp : yes
-Linker:
General -Additional library directory : acml5.3.1\win64_mp\lib
Input additional dependencies : libacml_mp_dll.lib
The program was build successfully and a number of dlls requested by the executable were placed in the executable folder .
On running the program everything went ok (a matrix filling part running under OpenAcc ) until a solver was called (cgetrf/cgetrs) expecting it to parallize also however then the following error appeared :
ERROR: INTERNAL ERROR : INVALID THREAD ID
What can cause this ?

Hi jo_pink,

The ACML library looks to have last been built back in 2013 and is too old to be used with the PGI 17.5 compilers. I was able to get them to work with the PGI 15.10 compilers but nothing later. (You can download 15.10 from our archives from the “DOWNLOADS” tab above)

I’d say you ask AMD to update the libraries, but it appears that they no longer support ACML.

You could also try using the BLAS/LAPACK libraries we ship with the 17.5 compilers. They’re based on OpenBLAS and quite good.

-Mat

Hi Mat,

Thank you for your response. I was working on two fronts here ; mkl or acml. Neither were functioning. I will drop acml altogether. With respect to mkl , it was only working on one thread. In the meantime I have been saved by an earlier remark of yours regarding adding -mkl=allcores . I did that alongside linking the appropriate libraries and mkl is now working perfectly (since yesterday).
All of this is , however, a stop-gap solution. I am looking out for a OpenAcc-compatible math library that automatically switches between gpu and cpu depending on the available hardware ! I am now using CULA (host version) for the solver part on GPU.

We live in hope !

Thanks for the reply ,

Jo

If you’re using Fortran and BLAS, then we have a generic interface module that you can use. It will use cuBLAS when passing in device data and regular BLAS when it’s host data.

There’s a few LAPACK routines as well but for the most part for those, you need to use conditional compilation or conditionals in your code to call the CPU vs GPU enabled versions.

-Mat

Mat,

I’ll try the Lapack option first:

I found the liblapack.lib and libblas.lib which , according to the 2017 user guide are all that need to be linked.
Did that and also added -mp=allcores both to the fortran command line and the linker command line.

First test is the using cpu only .

Ran the program but the solver (cgetfr/cgetrs) does not parallelize ?

Secondly : How do I activate the GPU versions ?

Jo

Hi Jo,

The BLAS routines are OpenMP enabled but not the LAPACK routines. Granted, “cgetfr/cgetrs” should be calling some of the BLAS routines so I would expect some parallelization.

You might try explicitly setting OMP_NUM_THREADS since the “allcores” may not be propagating to the libraries.

-Mat