Request: GPU Memory Junction Temperature via nvidia-smi or NVML API

Is there an ETA on this funcionality please?

3 Likes

I hope that future not took long to be alive

3 Likes

+1 we r waiting

3 Likes

Any news on this ? this request is dated from Fabruary 2021, prioritized for dev in April, reconfirmed prioritized…in June. How many releases do you do in a year ?

3 Likes

I swear if they’d spend like a couple of hours on the driver they’d get it released the next day! It’s not that much of a deal really…just port the NVAPI into the NVML and be done with it

4 Likes

@wpierce , any ETA?
Thank you.

3 Likes

Hello nVidia?!? Our GDDR6X is cooking itself in the summer heat and even with AC on it’s a hit and miss and we, Linux users have no way to monitor our Tjunction !!!

4 Likes

Yeah, I just tried with nvidia-smi to wrap it into one of my monitoring scripts (icinga2). This is for an rtx 3080.
nvidia-smi doc has these:

“temperature.gpu”
Core GPU temperature. in degrees C.

“temperature.memory”
HBM memory temperature. in degrees C.

so I would assume the memory junction would be temperature.memory but it just gives “N/A”

If it is not available, access to this info for monitoring with nvidia native tools is necessary for many applications. Going to keep looking, just +1’ing this thread!

7 Likes

+1

We need this ASAP.

3 Likes

@wpierce Prioritized to an engineer almost 3 weeks ago. When can we expect this?

In case you may not understand the urgency of this solution my 3090s are running pytorch or tensorflow models 24/7/365 and I’m unable to monitor temps. Since this was prioritized I’ve put another 430 hours of use on my expensive GPUs with no ability to monitor their condition. These cards are too expensive to be wear and tear items so we’d all appreciate the tools to monitor our equipment…ASAP.

7 Likes

+1

1 Like

Also extremely interested in being able to see this in Linux machines.

Thank you!

2 Likes