Unable to run on more than 1 GPU

patterson · October 4, 2022, 8:46pm

I am trying to run NVidia Modulus with FPGA laminar flow example and a personally created simulation. Both will run and provide results, but I am unable to get either to use more than 1 GPU at a time.

Launching using SLURM srun.
Using NVidia Modulus container, converted to Singularity.
Tested on nodes with T4 gpus and separately on nodes with V100 gpus.
I can run python sessions, import torch, and the device count will match however many GPUs I requested using srun. (inside and outside the container)
nvidia-smi also shows correct gpu count/types. (inside and outside the container)

I had to modify the srun command provided in the modulus documentation (Performance section), to set correct number of gpus.

srun --gres=gpu:t4:4 --cpus-per-gpu=8 singularity exec --nv -B /data:/data ./data/modulus.20.09.sif python ./data/MyProgramOrFPGAexample.py

I’ve attempted to specify --mpi argument as none, pmi2, and pmix. None of them change the gpu usage.

I’ve attempted to include mpirun (-np 4), but adding it prior to the singularity command just launches that many containers. Adding it as an argument to run in the container returns that the specified number of resources are not available, which seems odd considering nvidia-smi and torch show that the gpus are available.

Any help is appreciated!

asubramaniam · October 7, 2022, 2:56am

@patterson Looks like you don’t have the -n option in your srun command. This defaults to running a single task: Slurm Workload Manager - srun.

That’s probably why you’re also not able to run with mpirun too since the hostfile probably only has one slot based on the SLURM command.

If you instead run with srun -n 4 ..., that will launch 4 tasks and each task will target one GPU. Ensure that the following environment variables are set appropriately inside the container so that Modulus and PyTorch can pick those up to set up the distributed job: SLURM_PROCID, SLURM_NPROCS, SLURM_LOCALID and SLURM_LAUNCH_NODE_IPADDR.

patterson · October 12, 2022, 9:00pm

Thanks for the suggestions, they pointed us in the right direction. Single node success with mpirun threw us off a bit, but we traced it back to the SLURM_ variables not showing up correctly inside the containers.

Perhaps it is just our Slurm configuration, but to get the SLURM_ environment variables to show up correctly we need to allocate using an sbatch command and file:

sbatch slurmGPU.sbatch

where the slurmGPU.sbatch file contents are:

#!/bin/bash
#slurmGPU.sbatch

#SBATCH --job-name=Modulus 
#SBATCH --gpus=8  
#SBATCH --cpus-per-gpu=2
#SBATCH --output=sbatchOutput.txt

srun -n 8 singularity exec --nv -B /data:/data ./data/modulus.20.09.sif python ./data/mySimulationFile.py

which actually uses srun to launch with the appropriate number of gpus.

Hopefully this can help others. Thanks again.

system · October 26, 2022, 9:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
using all 4 GPUs in S1070 from multi-core cpu? how CUDA Programming and Performance	11	32451	December 13, 2010
How to specificy which GPUs to run on Legacy PGI Compilers	5	7477	December 8, 2010
manage jobs in multi-gpu system with compute exclusive mode or not CUDA Programming and Performance	14	4155	September 3, 2010
mpirun on CUDA GPU CUDA Programming and Performance	0	2767	April 23, 2012
Specifying GPU device using OpenMPI mpirun CUDA Programming and Performance	5	13320	December 13, 2011
Enabling multiple GPUs Technical Support (PhysicsNeMo Only) gpu	1	1564	March 29, 2023
About two or more GPUs Legacy PGI Compilers	6	7172	July 31, 2012
Sharing 1 GPU betwenn MPI tasks work fine with 4 mpi tasks but cudaMalloc "unknown error" wi CUDA Programming and Performance	4	5957	April 10, 2009
PyTorch with Slurm and MPS work-around --gres=gpu:1 Deep Learning (Training & Inference)	1	1516	September 11, 2020
configuring workload manager on cluster with Nvidia Tesla s1070 CUDA Programming and Performance	4	3146	July 26, 2009

Unable to run on more than 1 GPU

Related topics