Request: GPU Memory Junction Temperature via nvidia-smi or NVML API

Thanks for sharing - and this one use case is firmly on my list.

USB cable (running the card on a riser).

@nadeemm yeah we know how you keep those lists - ā€œthings I should find an excuse for, next time someone asks for X or Yā€

Donā€™t worry, folks. Memory temperature isnā€™t applicable to anything. nvidia-smi told me so.

āžœ  ~ nvidia-smi -q |grep "Memory Current Temp"
        Memory Current Temp               : N/A
2 Likes

I would also like to request that this be included in Linux - I am working on AI/ML with my 3090ā€™s on linux and quite often I am finding issues that I think are thermal related, but cant prove due to no info being available.

1 Like

I am also a researcher and use a 3090 for my linux workflow. I just got a paper accepted in part thanks to my 3090 ( preprint: DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity | bioRxiv, but really could have used the memory junction temperature as well due to my card throttling at times under heavy loads!

4 Likes

Nvidia, please, add this function to control memory temperature under heavy load in Linux (Ubuntu)

1 Like

Arenā€™t there already hundreds of people loyal to your products demanding you to give them a better software support for the product they just bought ? You love to be criticized ? You enjoy upsetting the community ?
Give people the peace of mind that they wonā€™t burn the hardware with extensive loads 24/7 by sharing a proper monitoring tool and not pointless promises about micron specs and all of that cooking! THATā€™S IT !

2 Likes

Please implement the memory junction temp control in linux at last

4 Likes

All we are asking is if we can have the same ā€œGPU Memory Junction Temperatureā€ value in linux as HWinfo and GPU-Z and other programs have in windows (especially for GPUs with GDDR6X memory). Even if they are not the exact Tj max temp of chip, they still give us a very very close estimate to what is the hottest part on the memory module.

I understand that this information is not native to the NVML API (in both windows and linux), but shouldnā€™t it be added??

2 Likes

Can you do the same comparison for an RTX 3090?

Can you please get a team to legitimately fix this feature in Linux? The length of time this has taken to come out is almost an embarrassment for Nvidia card owners at this point.

The number of crypto miners in this thread throwing a fit and getting actual responses from Nvidia despite thousands of threads from paid customers not getting even a single reply is telling. Money talks, I guess.

FWIW, Iā€™m working on an Nvidia GPU monitoring utility based on NVML and this would be useful for me too, although it sounds like HWINFO or w/e is doing something special to get it. Itā€™d also be nice if a default value was added to the acoustic enum, per-process utilizations where fixed, GPU clock reading was fixed on Windows, and application setting API support for my EVGA GTX 1080 FTW2 ICX.

Also, the ability to increase the voltage on Pascal+ would be nice too.

Itā€™s not just about crypto mining. Itā€™s common sense.
Say you buy a car, but the car manufacturer wonā€™t let you see the oil/cooling fluid temperature. Kinda same thing.

I agree that having a sensor for reading memory temp is ā€œcommon senseā€ and wish they had required AIBs to provide a standard memory temp sensor.

But, what I think is happening here is that some partners are adding extra sensors to their cards, like EVGA with their ICX tech, and HWINFO is reading those. I would be thrilled if they exposed those sensors, but I think itā€™s unrealistic without major collaboration between Nvidia and partners.

Well, those sensors values have to pass through the core anyway, as everything else.
They are located at some bus/address/whatever. Who knows how HWINFO is reading those, I doubt theyā€™d share that information, but having a pointer to where the values could be read fromā€¦ has to come from nvidia. Soā€¦ the temps are there, theyā€™re just not telling us how to read 'em.
The other option is to disassemble HWINFO and look for it ourselves. If someoneā€™s skilled enough to pull this off - what r they waiting forā€¦ Cause from where Iā€™m standing ā€¦ nVidia are just sitting on their asses avoiding this like the plague.

Please Nvidia People, we need this function to control the memory temperature in LINUX. Why only provide this in Windows???
Thanks in advance

Can we please get this feature rolled out to ubuntu. It would really help out with some projects.

NVIDIA, can you just implement this function (mem temp data) in Linux w\o any ā€œaround talkingā€? Personally I canā€™t believe that this is so damn hard to doā€¦
Unless you have a reason not to do thisā€¦

I sold my RTX 3080 because VRAM problem, Itā€™s too hot and I am afraid it will have a bad effect on devices in long term. Lucky that My 3080 is 1000$ at the time I bought it, and I sold It for 2000$ now because of cryptocurrency even though I have used It for 1 year. With that budget, I am going to decide to buy duo rtx3060 12gb for Machine learning nevermind about DDR6X temperature problem.