batch scheduling system Slurm can use NVML for autodetection of GPU hardware. Currently I use Slurm 21.08 compiled against NVML 11.4 with A100 GPU. I want to use H100 GPU now (don’t have it yet). Will autodetection work with this version? What NVML version do I need for H100 support? I didn’t find any docs for this.
Hi,
Hopper support was enabled in CUDA 11.8 so any NVML release from 11.8 or newer will support H100. Please note that there was an ABI breakage bug in the NVML versions shipped with CUDA 12.2 and CUDA 12.2 U1 so we recommend either using an NVML version from 11.8 / 11.9, or waiting for the upcoming CUDA 12.2 U2 release. Thanks!