Redistributable runtime libraries for pgfortran

I have using pgfortran for a two years and have working cuda-based applications that I use extensively. I now want to run these programs on a separate compute box to leave my development computer free for development. For Intel compilers, Intel provides what they call a “Redistributable library” package that can be installed on the target machine that provides the runtime environment needed to support applications compiled with their compilers. The alternative is to compile with the “static” option that links the needed libraries at compile time. I can find no discussion of the equivalent for pgfortran. How to people normally setup the runtime environment for their pgfortran codes to run on machines of opportunity?

Hi GeoJohn,

You can find information on distribution and deployment in the PGI Release Notes: https://www.pgroup.com/resources/docs/18.7/x86/pgi-release-notes/index.htm#app-deploy-redist

Basically, the redistributable libraries are located in a directory called “REDIST” under the PGI installation (for example “/opt/pgi/linux86-64/18.7/REDIST”). The libraries may be bundled with your application for use on the target system.

Hope this helps,
Mat

Mat,

Thanks for your the help. I found the REDIST directory on my development box and have copied it, resolving the symbolic links, to a flash drive and will give it a try on the target machine. At this point my target machine’s operating system is pretty screwed up from my various attempts to get the transferred code to run. My plan is to format the hard drive, reinstall Linux and add the REDIST directory to a clean install. I will get back to you in a day or so.

Thanks much!

John Dunbar

I tried, as you suggested, copying the REDIST directory to the target machine, placing it under /opt … with the same path as on my development box. Then I added the required environment variable to my .profile to point to the REDIST directory. I also installed the Nvidia driver consistent with Ubuntu 14.04, which is what I am running on my development box. When I tried to run my application, the program started fine, meaning to me that it was finding the fortran dependencies. This at least is a start. However, it found no cuda devices and exited. The target box has two K80s installed. So the REDIST plus the Nvidia driver is not getting everything needed to run a code compiled under pgi fortran 90. My plan now is to try to down load and install the full pgi development environment on the target box and see if that will do it. The problem is, I am still running last year’s version, which may or may not be available. If not, I will have to try to rebuild both the development box and the target box with the 2018 version. Before I do that, I think I will order a new SSD for the development and just pull the existing one for safe keeping. That way I will be able to get back to where I am now. I will let you know how it works out. Thanks for your help.

Hi John,

On the target system, can you run the “nvidia-smi” utility to check which CUDA driver you have installed?

What’s the exact error message that you’re getting?

Also, when you built the binaries, what compiler flags did you use?

I’m wondering if this is a system issue (such as and old CUDA driver installed), or if it’s a build issue.

When you build the binary on the build system, be sure to compile with “-ta=tesla:cc35” (or create a unified binary targeting multiple devices by either compiling with just -ta=tesla, or -ta=tesla:cc35,cc50,cc60). Also, unless specified as a command line options, the default CUDA version used will vary by compiler. It’s usually the CUDA version that’s one release behind the CUDA version that was current at the time the PGI Compiler was released. Hence, if you’re building with a newer CUDA version than what’s installed on the target system, then this can cause issues.

-Mat

Mat

You were correct. I was missing the cuda drivers. After installing the drivers using “sudo apt-get nividia-384 nvidia-modprobe” the compiled code runs and executes the cuda kernel calls. However, I still have three remaining problems:

(1) I can establish p2p communication between the two gpus on the first k80 card, but not between the second gpu on the first card and the first gpu on the second card.

(2) When I limit the code to just running on the first k80 card, it runs to completion without reporting an error, but the data transmitted by p2p is not correct. I plan to do some testing on this to get more info.

(3) When I rebooted the first time after installing the nvidia drivers I got the famous infinite-login-loop problem. I can get around it by switching to the shell prompt screen (cnt-alt-f3), logging in and purging the nvidia drivers with “sudo apt-get purge nvidia*”. Then I can reboot to the disk top, reload the drivers and I am good to go. Since then I have just been purging the drivers each time I shut the system down and re-install them at the start of the next session. This is ok for testing, but it would be good to solve at some point.

On the p2p problems, I have written a little cuda fortran program that just sets up p2p and transfers a couple of short vectors back and forth between gpus. I will compile this test code on my development box, transfer it to the K80 box, and run some tests to see what is going on. I have been compiling the codes for the K80 box with pgfortran release 17.4 from last year, using mcuda=kepler, but wonder if there is some incompatibility somewhere.

Hi John,

I’m not sure I’ll be much help here since these seem to be system issue related to your driver and possibly BIOS.

For the peer-to-peer issue, I saw this post over on NVIDIA’s devtalk user forums where the issue seemed to be a problem with the system BIOS (solution on the second page) Not sure if it will help with your issue, but could be related.

-Mat

Mat,

I agree. You have solved my original problem. I did write a little code that transfers short vectors between gpu0 and gpu1 using the p2p copy. That worked fine. So more experimentation is needed to track down the problem in my original code. If I cannot solve the problem, it would be best addressed in a new thread on that topic.

The login-loop problem has nothing to do with pgi. I am going to check with Supermicro about a BIOS update. If that does not work, I will try starting a thread on Nvidia’s and/or Ubuntu’s user forms.

Thanks much for your help.