After changing thermal pads/paste, how to validate everything is OK. This woudn’t be an issue if I’ll have access to memory/core temps. Core temps are exposed through nvidia-smi, memory temps are not. Just throuttling the performance is not a way to go, because if thermal pads doesn’t have contact with memory, any significan’t load will just black screen the system, bare contact will keep system running but memory temps will be higher than they should be and still throttle or not throttle but run at much higher temp that it should be.
There’s no public spec available for the termal pads and large variaty of pads available, there’s no 100% gurantee installed pads will have best possible contact (or quality) or amount of thermal paste doesn’t prevent good contact. Wrong pads will shorten or even make card unusable.
Good example - watercooling, with waterblock memory temps will not go above 60C (in my case). With too much thermal paste on the die, memory temps will be above 80C, and there’s no way (under linux) to validate contact. With too little thermal paste, memory will have good contact, but die temps will be much higher. With perfect application both temps will be nice and low.