Here’s the steps I took to get Modulus working on a HPC cluster using V100s. The steps also worked for a machine using a singular A100 last time I checked a few months ago.
Create virtual environment for SimNet
conda create --name SimNetv21 python=3.7
conda activate SimNetv21
Install prerequisites
pip install cmake
conda install -c anaconda gxx_linux-
64
pip install horovod==
0.21
conda install -c conda-forge tensorflow-gpu=
1.15
Install SimNet
Now that the environment has been set up with the required prerequisites, you can follow the Bare metal installation instructions found within the SimNet user guide:
pip install matplotlib transforms3d future typing numpy quadpy\
numpy-stl==
2.11
.
2
h5py sympy==
1.5
.
1
termcolor\
psutil symengine==
0.6
.
1
numba Cython chaospy
pip install -U https:
//github.com/paulo-herrera/PyEVTK/archive/v1.1.2.tar.gz
tar -xvzf ./SimNet_source.tar.gz
cd ./SimNet/
python setup.py install
To run examples using the STL point cloud generation you will need to put libsdf.so in your library path and install the accompanying PySDF library. This can be done by
cd..
export LD_LIBRARY_PATH=$(pwd)/SimNet/external/pysdf/build/:${LD_LIBRARY_PATH}
cd ./SimNet/external/pysdf/ python setup.py install
Adjusting SimNet
To edit SimNet code, navigate to SimNet directory, /SimNet/simnet/, then edit or replace the desired files. Then update the SimNet package with setup.py just as before
cd ./SimNet/
python setup.py install
Configuring SimNet environment for HPC
When installing SimNet to the hpc you may encounter some CUDA library issues. To resolve this, a system link can be created pointing Tensorflow to the correct location where CUDA is installed.
First create a sandbox from the container, this sandbox allows you to access all the files needed to run SimNet.
singularity build --sandbox SimNetv21_sandbox docker-archive:
//simnet_image_v21.06.tar.gz
You can then upload the required CUDA files to your hpc space and then subsequently create a system link pointing to the needed CUDA library.
The system link needs to be created in each SimNet case you want to run. For example, to run the Helmholtz example you have to create a system link in that directory. To create the system link.
cd ./examples/Helmholtz
ln -s /u/… /SimNet_sandboxv21/usr/local/cuda ./cuda_sdk_lib
With the system link you can now execute training as usual.