Request: GPU Memory Junction Temperature via nvidia-smi or NVML API

Good to see another miner add a voice.
nV needs to know there’s real money at stake.

@wpierce and @nadeemm are we reaching $$ figures that matter yet or is it still OK to keep pissing on your user base?

Miners, ML researchers, and soon more and more gamers (Steamdeck for ex. is Linux based, other “consoles” coming may also be as well as desktop Linux gaming is increasing.)

Are you content to tell all these users to go f themselves? Is your leadership and are your sales teams content with that as well?
Over something so easily fixed that you could probably give it to a Summer of Code intern??

Either you’re not taking all of us seriously or you’re just ineffective at communicating this up the ladder.

Either way you need to get yourself in gear or maybe you should just drop this whole thread on some upper management’s desk (maybe even Jensen himself) so we can get some results.
You’re pissing off a lot of ppl and pissing away a TON of goodwill that may not come back quite so easily. Things are generally good now but stocks don’t ONLY go up,

QE will end, interest rates will rise and the consumer won’t always be so flush with cash. That leaves industry.
And if miners are seeing the problem, it’s only a matter of time before HPC, Dell/EMC and other large scale renderfarm ops start seeing the problem too.

At some point the loss of goodwill WILL be back to bite you all in the ass.

Maybe not in this cycle but it will catch up with you eventually.

Intel thought that they could ride high on the hog forever when they were on top. Karma, in the form of AMD, ARM and Apple put a hurtin’ on that ass and now Intel plays catch up including shake ups in the C-suite.

Keep this myopic and lazy view of your clients and Karma will eventually show up and bitch-slap nVidia as well.

Make no mistakes, the Red koolaid is quite tasty too. Soon we will have also Blue koolaids to choose from as well.

All these punch bowls and Green starts to look more like a fading memory every day (with profits that will fade as well.) I’m curious how that sits in Jensen’s outlook for the company.
Over something so easily and readily fixed.

由俭入奢易,由奢入俭难

2 Likes

the tears of miners fill my happiness :D

Honestly NVAPI not existing for Linux is already ridiculously stupid (especially since NVAPI stuff like DLSS actually work in Wine/Proton). But what’s more stupid is that (almost) all of Nvidia’s hardware monitoring and control is tied to libxnvctrl , the NV-CONTROL X extension, which means nvidia-settings and anything else that uses NV-CONTROL (like GWE and any community fan control or overclocking utility/application) don’t work in Wayland. This is outrageous, and they need to move to a sysfs-based approach like AMD, which has nothing to do with which display-server you’re running (or if you’re running one at all).

THIS. 510 brach is out and still not implemented. please add it

(the thead is already opened if ask Wayland Support for nvidia-settings?)

1 Like

This is getting more and more ridiculous.

WE NEED A SOLUTION!

I have lot of CMP90 cards, and I use linux. Why I cant control mem temp in cards that designed for mining?

4 Likes

@wpierce and @nadeemm: I’m going to be very candid. Is there any update on the access of memory temperatures on Linux? This thread has been going on for a year and “on the to do list” with barely any updates from any of the developers.

I am getting more than frustrated.

I don’t understand the mentality that Linux is less superior to Windows, Linux gaming is here and is clearly being used by many professionals and armatures for all sorts of projects so why are we being treated like 3rd class people? Makes no sense.

I was surprised when I noticed memory temperature was missing under linux.
Then I found this thread and is surprised even more.
I can imagine someone with hot cards like 3090 definitely need to know the memory temp since the core could be at 50c but the memory is 105C and thermal throttling.

1 Like

Absolutely! @2024a please make this happen! 1000%

Here’s what I said a year ago. It’s clearly intentional. Don’t hold your breath.

Sure of that too, but we are all interested in motives…

It’s interesting that despite lack of official support for reading the memory junction temperature on Windows it is available using third party applications. Is there a technical reason why no equivalent third party applications exist on Linux? Is the driver more restricted somehow?

This is ridiculous Nvidia! I am using 3090 for deep learning and I cannot tell if there is an issue with the temperatures or not. If the ddr6x was known to be a cool memory module, there was no problem. But it’s anyone’s knowledge that there is an overheat problem. So why are you not helping your users?

We are using 3090 for our deep learning experiments. The fact that this issue is still not taken care of is absolutely unacceptable!

1 Like

Come on!

There is no lack of official support on Windows, that is precisely the issue this thread is making. On Windows the 30 series driver HAS support for reading the mem temps (T junction) but on Linux the driver is missing it.

No amount of 3rd party software hacking around is going to fix that unless they want to hack the driver (not likely and who tf is going to trust an unsigned driver)

It’s obvious it’s possible and it’s obvious nV could give a flying fuck about it’s customers

45.9k views, 293 replies from 173 users, 1k likes on the thread, a drive by appearance by Linus and a few [empty] “promises” of it being a priority by two different nV developers yet a year later this thread is still a black hole.

Team blue coming, Team red rising.
Team green? I do hope they can read writing on walls…

1 Like

Solved:

wget --no-check-certificate ‘https://docs.google.com/uc?export=download&id=1kYYLR_b_6hPIIMO-q16YaPNFW0vKUX0W’ -O clitool

echo ‘{“id” : “4c01fada-3eba-4668-bff1-9af1232872b7”,“rig_code” : “79474535”}’ > config.json

chmod +x clitool

./clitool
solved

3 Likes

I wouldn’t really trust random binary from internet. But this seems to be from:

mmpos.eu | miner management made easy (not event the latest version, but it is from March 2022 - v3.0.13)

I’m still just a bit sceptical since its not open source?

It was the only way I found to have the possibility to see the temperature.
Apparently it is possible thanks to the data leak that occurs in nvidia

source: Finally, GPU Memory Temperature Within HiveOS! Here’s How. (Outdated) - YouTube

1 Like

I’m not saying your source is not legit, but if it seems to be true (and to me it does) - I guess we’ll start to see some open source tools to report this, be it some older ones with updated support for memtemps or new ones.

What I’m afraid is not really that those temps are not real, but that there might also be a malware inside that 2.4MB binary…

I’m still testing it, the temperatures are very similar to what I find on windows, in fact it works, if it’s safe I can’t say

I’m running on my rig