we have a pgi7.2 cdk installed on a cluster of 64nodes, everything worked well, but after one of the nodes(cn03) crashed, I cloned it with the image i backed up before the installation of pgi7.2 cdk, the I can not use pgi now, as followed:
Your cluster should only have one system, typically the head node, that runs the license manager (lmgrd). Unless cn03 really is your head node, then you shouldn’t need to run it.
Also, typically the compilers are only installed on the head node, with the remaining nodes only needing some runtime support libraries. If this is the case for your cluster, then you only need to copy over the $PGI directory (by default /opt/pgi) from one of your other nodes. You can also just reinstall the CDK.
If you do have the compilers installed on each node, then you just need to copy over the same license.dat file that your head node uses to the local $PGI directory. Though, you may need to edit the host name if the internal name of your head node if different from the external name.
The pgi compiler dir is exported to a node cn03 from hpconsole, which is the license server, or call a head node, this node doesn’t have a infiniband card, but another node has, which is cn03. I logged into cn03, tried to make mvapich rpm ans install it in OFED, (that is to make mvapich rpm using pgi compiler), before I cloned this nodecn03, it successed in making mvapich rpm, after the cloning, I did it again, then it failed.
I had already started lmgrd.rc on license server hpconsole, and there is a pgroupd running on hpconsole:
Following is the log of OFED, it told me about the license problem.
Configuring MPICH Version 1.2.7 (release) of : 2005/06/22 16:33:49
checking whether filesystem respects case in file names… yes
checking for current directory name… /var/tmp/OFED_topdir/BUILD/mvapich-1.1.0-3143
checking for install
checking for ranlib
checking for gnumake… yes using --no-print-directory
checking whether make supports include… yes
checking for OSF V3 make… no
checking for virtual path format… VPATH
checking whether pgCC returns correct error code… yes
checking whether selected C++ compiler can compile iostream.h… no!..Cannot use pgCC -fPIC compiler
checking for cc… no
checking whether the compiler pgcc accepts ANSI prototypes… no
*# The compiler pgcc does not accept ANSI prototypes
checking for gcc… no
checking whether cross-compiling… yes
checking whether the compiler pgcc runs… no
pgi-cc-lin64: LICENSE MANAGER PROBLEM: Failed to checkout license
pgi-cc-lin64: LICENSE MANAGER PROBLEM: Failed to checkout license
pgi-cc-lin64: LICENSE MANAGER PROBLEM: Failed to checkout license
pgi-cc-lin64: LICENSE MANAGER PROBLEM: Failed to checkout license
pgi-cc-lin64: LICENSE MANAGER PROBLEM: Cannot connect to license server system.
The license server manager (lmgrd) has not been started yet,
the wrong port@host or license file is being used, or the
port or hostname in the license file has been changed.
Feature: pgi-cc-lin64
Server name: hpconsole
License path: /hptc_cluster/pgi/license.dat:
FLEXnet Licensing error:-15,570. System Error: 115 “Operation now in progress”
For further information, refer to the FLEXnet Licensing documentation,
available at “www.macrovision.com”.
error: Bad exit status from /var/tmp/rpm-tmp.1781 (%install)
Does this set-up work on the other nodes and head node? Did you compile on this node prior its crash? I ask since a typical CDK install has the compilers installed on the head node with only a few runtime support libraries installed on each of the nodes.
If you are able to compile on the other nodes with this setup, I would suspect that the node can’t communicate with the head node. Another possibility is that the server name in the license.dat file is different from the name used by this node to communicate with the head node.
If this doesn’t help, please send a note to license@pgroup.com. I know the basics of licensing, but our customer service is better at diagnosing more complex licensing problems.