Request: GPU Memory Junction Temperature via nvidia-smi or NVML API

Thanks - I will definitely use this as ammo.
For interest - what cable did you replace - was it the GPU Power cable ?

Thanks for sharing - and this one use case is firmly on my list.

USB cable (running the card on a riser).

@nadeemm yeah we know how you keep those lists - “things I should find an excuse for, next time someone asks for X or Y”

Don’t worry, folks. Memory temperature isn’t applicable to anything. nvidia-smi told me so.

➜  ~ nvidia-smi -q |grep "Memory Current Temp"
        Memory Current Temp               : N/A

I would also like to request that this be included in Linux - I am working on AI/ML with my 3090’s on linux and quite often I am finding issues that I think are thermal related, but cant prove due to no info being available.

1 Like

I am also a researcher and use a 3090 for my linux workflow. I just got a paper accepted in part thanks to my 3090 ( preprint: DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity | bioRxiv, but really could have used the memory junction temperature as well due to my card throttling at times under heavy loads!

3 Likes

Nvidia, please, add this function to control memory temperature under heavy load in Linux (Ubuntu)

1 Like

Aren’t there already hundreds of people loyal to your products demanding you to give them a better software support for the product they just bought ? You love to be criticized ? You enjoy upsetting the community ?
Give people the peace of mind that they won’t burn the hardware with extensive loads 24/7 by sharing a proper monitoring tool and not pointless promises about micron specs and all of that cooking! THAT’S IT !

2 Likes

Please implement the memory junction temp control in linux at last

3 Likes

All we are asking is if we can have the same “GPU Memory Junction Temperature” value in linux as HWinfo and GPU-Z and other programs have in windows (especially for GPUs with GDDR6X memory). Even if they are not the exact Tj max temp of chip, they still give us a very very close estimate to what is the hottest part on the memory module.

I understand that this information is not native to the NVML API (in both windows and linux), but shouldn’t it be added??

2 Likes

Can you do the same comparison for an RTX 3090?

Can you please get a team to legitimately fix this feature in Linux? The length of time this has taken to come out is almost an embarrassment for Nvidia card owners at this point.