Recently i have installed automatic1111, a stable diffusion text to image generation webui, it uses Nvidia Cuda, im getting one in 3 glitchy images if i use half (FP16) precision or autocast, But when use no half (FP32) i get normal images but it halves the performance, its slow and eats up my full vram,
I want to know why these glitchy images happening, where does the problem lies?
I don’t know what you mean by “glitchy image”. If you are referring to images with certain undesired artifacts, maybe circle the relevant portions of the image in red, so it is clear what we are talking about.
Artifacts in imagery could be due to either the reduced accuracy or the more restricted dynamic range of FP16 compared to FP32. From personal experience with the use of s15.16 fixed-point arithmetic versus FP32 arithmetic in OpenGL-ES, severe artifacts are likely due to the latter, i.e. limited dynamic range causes overflow or underflow in intermediate computation. The risk of that occurring with FP16 is high.
This can sometimes be addressed with moderate effort by local re-scaling in the affected computation. If your application is open source, you could use standard debugging techniques to determine where the issue occurs (it may be more than one place) and try to remedy it. If this is closed source software, you can raise the issue with the software vendor, i.e. file a bug report.
What I described above is what I consider the most likely scenario (that is, a plausible hypothesis), it is not a definite root cause analysis. Only thorough debugging can establish the root cause with certainty.
Generally speaking, issues with applications should be brought to the attention of the application vendor or creator, not the suppliers of underlying technology.
Thank you for the reply, I’ll ask the creator of the application,
If you get random glitches even when using the same seed and prompt for one generation, then it might be hardware faults.
You could attempt to reduce the GPU and memory clocks of the GPU using commonly available overclocking tools. If the errors disappear afterwards, it’s likely a hardware issue.
Thanks, im not getting glitchy image everytime, maybe 2 in 10 images i generate turns glitchy,
how much should i underclock?
To test the hypothesis that what you are seeing is a hardware-related issue, you would want to underclock by a significant amount, say 20%. If that does not reduce the error rate to zero, I would consider the hypothesis refuted.
I installed Msi afterburner and underclocked both the core and memory to 100Mhz and generated 100 images without any artifacts or glitches, so can i use my gpu underclocked or any other remedy?
Underclocked by 100 MHz or underclocked down to 100 MHz? I hope it is not the latter, as that would cut GPU performance by more than a factor of 10.
I do not know what kind of RTX 3060 you have. There is certainly a whole gamer-oriented market for vendor overclocked GPUs, that is, GPUs that are clocked higher than specified by NVIDIA right off the production line. And the vendors of those GPUs are basically saying “Trust us, this works fine”, but I doubt they validated their products by running CUDA-based software. Graphic workloads and compute workloads tend to stress GPUs in different ways.
For compute applications I therefore warn people away from these products when asked. Maybe you have such a factory-overclocked model, and lowering the clocks (back to NVIDIA standard frequencies or thereabouts) enables the GPU to compute reliably.
There may be firmware editing tools to make reduced clock and TDP settings permanent. But it’s also risky as it may be possible to permanently brick some cards.
Also the last time I’ve used such tools was on the GTX 10x0 series. I am not sure if these are still viable on the 30x0 and 40x0 series of cards.
if you have a factory overclocked card, then maybe installing a flash bios from the standard clocked card by the same vendor might do the trick.
ah, it’s a laptop.
You could attempt to figure out whether it’s the RAM or the core clocks that cause the issue. It might be enough to just underclock one of these.
Then the performance impact will not be so significant.
Sorry, can’t help you with the tool. The frequencies displayed in the linked image would suggest to me that the GPU is idling except for driving the display. What is of interest, however, is what the frequencies look like when the GPU operates with a full workload.
That makes sense, as (in my limited experience) on modern GPUs the memory subsystem is usually significantly more sensitive to overclocking attempts than the GPU cores.
so it is memory problem, when i only underclocked the Core i got glitched images
-50 mhz is the least underclock i can do and still get normal images
Can i continue to use this settings? or is it fatal?
Underclocking causes no harm at all. It is actually overclocking that can cause long-term damage to semiconductor devices especially if it also involves raising the operating voltage.
Semiconductors physically age, and this process is accelerated by higher temperatures which typically result from overclocking. Higher voltages and the resulting higher currents may also accelerate failure mechanisms like electromigration (which can cause shorts or thin wires which then transmit signals more slowly) and hot-carrier injection (electrons becoming trapped in transistors cause them to switch more slowly). However, during the typical lifetime of a PC of around five years these effects rarely cause actual failures unless equipment is operated with a near 100% duty cycle.
oh great, Thanks to both of you for taking interest in this matter and showing me a way out of this glitch,
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.
On the topic of VBIOS modding allowing to adjust clocks and timings: New tools (omgvflash und nvflashk) are available that appear to finally allow flashing the more modern nVidia cards (anything post Maxwell generation). It took so long to find a loophole, as nVidia has locked down their BIOS using digital signatures and additional validation.