CUDA-aware OpenMPI: No GPU process when benchmarking

I’m attempting to configure OpenMPI to use CUDA. However, once the system is setup and I execute an mpirun command and check the GPU processes using nvidia-smi, nothing appears to be there. (There does appear to be some memory usage when running vs not running benchmarks, but I’d really like the GPU do some processing)

I’m using NASA NAS Parallel Benchmarks, specifically EP - Embarrassingly Parallel, since I would expect there to be some tangible increase in performance between a CUDA-aware vs non-CUDA OpenMPI, but I don’t see any gains.

Note when not running benchmarks, no memory is used.

I installed both CUDA and the Nvidia driver from runfiles.

My Nvidia driver version is 460.91.03
My Cuda version is 11.2, I’ve also attached the results of nvcc --version

(In order: nvidia-smi when not running benchmark, nvcc --version, and nvidia-smi when running benchmark)

I’m also using UCX configured with this command:
./configure --prefix=/usr/local/ucx/ --with-cuda=/usr/local/cuda-11.2

Then configuring OpenMPI as follows:
./configure prefix=/usr/local/ompi --with-ucx=/usr/local/ucx --with-cuda=/usr/local/cuda-11.2

I’ve been banging my head on this for a couple days now, so any insight would be greatly appreciated.