Hello Nvidia Forum Community,
I hope this message finds you well. I am currently working on setting the temperature threshold for a GPU using the nvmlDeviceSetTemperatureThreshold
API, but I have encountered an issue that I’m struggling to resolve.
Here’s a snippet of my code:
if (TEMPERATURE_LIMIT != OFF) {
int temperatureLimit = TEMPERATURE_LIMIT;
result = nvmlDeviceSetTemperatureThreshold(gpus[i], NVML_TEMPERATURE_THRESHOLD_SLOWDOWN, &temperatureLimit);
if (result != NVML_SUCCESS) {
printf(“Failed to set temperature limit for GPU:%d, %s\n”, i, nvmlErrorString(result));
exit(EXIT_FAILURE);
}
}
Despite using a valid GPU device (as confirmed by successful operations like setting power limits on the same device), a valid threshold type (NVML_TEMPERATURE_THRESHOLD_SLOWDOWN
), and a non-null temperature limit (temperatureLimit
), I consistently receive an INVALID_ARGUMENT
error.
I have thoroughly reviewed the NVML documentation and ensured that my parameters meet the requirements. The perplexing part is that the same GPU device works well with other NVML APIs.
Any insights, suggestions, or guidance on how to troubleshoot and resolve this issue would be highly appreciated.
Thank you in advance for your assistance!
Best regards,
Varun Parashar
IIITD