nvmlDeviceSetTemperatureThreshold API Returns INVALID_ARGUMENT Error

Hello Nvidia Forum Community,

I hope this message finds you well. I am currently working on setting the temperature threshold for a GPU using the nvmlDeviceSetTemperatureThreshold API, but I have encountered an issue that I’m struggling to resolve.

Here’s a snippet of my code:

if (TEMPERATURE_LIMIT != OFF) {
int temperatureLimit = TEMPERATURE_LIMIT;
result = nvmlDeviceSetTemperatureThreshold(gpus[i], NVML_TEMPERATURE_THRESHOLD_SLOWDOWN, &temperatureLimit);
if (result != NVML_SUCCESS) {
printf(“Failed to set temperature limit for GPU:%d, %s\n”, i, nvmlErrorString(result));
exit(EXIT_FAILURE);
}
}

Despite using a valid GPU device (as confirmed by successful operations like setting power limits on the same device), a valid threshold type (NVML_TEMPERATURE_THRESHOLD_SLOWDOWN), and a non-null temperature limit (temperatureLimit), I consistently receive an INVALID_ARGUMENT error.

I have thoroughly reviewed the NVML documentation and ensured that my parameters meet the requirements. The perplexing part is that the same GPU device works well with other NVML APIs.

Any insights, suggestions, or guidance on how to troubleshoot and resolve this issue would be highly appreciated.

Thank you in advance for your assistance!

Best regards,
Varun Parashar
IIITD

Only NVML_TEMPERATURE_THRESHOLD_ACOUSTIC_CURR is a valid input for that function.

Hi, Thanks for your reply. I used NVML_TEMPERATURE_THRESHOLD_ACOUSTIC_CURR as the threshold type but now I get an error saying ‘Not Supported’, meaning that I do not have a temperature sensor on my device(which is a P100 GPU) but in the documentation it says that it is supported ‘For Maxwell or newer fully supported devices’.

Documentation is sometimes wrong. If you can’t change it via nvidia-smi, it can’t be changed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.