I’m running nvcr.io/nim/meta/llama-3.1-70b-instruct:latest
on 1xH100 SXM (same issue with 2xH100 SXM).
I’m getting the error:
2024-07-25T11:16:04.260537238Z INFO 07-25 11:16:04.260 ngc_profile.py:224] Detected 0 compatible profile(s).
2024-07-25T11:16:04.260603041Z ERROR 07-25 11:16:04.260 utils.py:21] Could not find a profile that is currently runnable with the detected hardware. Please check the system information below and make sure you have enough free GPUs.
2024-07-25T11:16:04.260630682Z SYSTEM INFO
2024-07-25T11:16:04.260636242Z - Free GPUs:
2024-07-25T11:16:04.260640641Z - [2330:10de] (0) NVIDIA H100 80GB HBM3 (H100 80GB) [current utilization: 0%]
The right approach is probably to list the profiles, but I can’t do that because I don’t have a way to pass docker arguments. I tried guessing some profiles to try and set them with the NIM_MODEL_PROFILES env variable, see here but that didn’t work.
Can someone recommend what profile to pass? Or what other GPU to use if H100 SXMs aren’t supported (which would be odd).