I’m currently writing a pathtracer so I can study CUDA beyond the toy examples. While I was testing my code yesterday, the computer simply shutdown as soon the program reached the more critical part, which is exactly when launching the kernel which try to find intersections between the rays and triangles.
I’m launching one thread per ray, and one ray per pixel, which means that my sample program spawned 640x480 = 307200 threads organized as 300 blocks 1D blocks of 1024 threads. As soon as I run it, my laptop will shutdown IF power supply is unplugged. Reducing the number of rays from 640x480 to 160x120 = 19200 “fixes” it. When power supply is plugged in, everything goes fine.
This behavior makes me believe there is some problem concerning power consumption; maybe launching so many rays at the same time (with all of them doing lots of global memory lookups) is too heavy as a task and my card draws lots of current, causing the system to shutdown for safety reasons.
I had trouble finding a similar problem in forums. Have you guys ever had a similar problem? Am I frying up my card/motherboard? Is there anything I can do, besides processing less rays at a time? I’m afraid I’m slowly killing my card, so I didn’t do further tests yet.
My laptop is an Acer Aspire VX-15, equipped with a GTX 1050Ti card. If you need any profiling, let me know; I don’t really know which informations are useful for solving this.
Impossible to know for sure by remote diagnosis, but from your description it certainly sounds like a case of the system being unable to deliver sufficient power to the GPU when the GPU is under heavy load.
Since it works fine with the AC power plugged in, the issue appears to be not an insufficiently-sized PSU (a typical scenario reported in these forums), but rather an issue with a weak battery.
I don’t have experience with battery-operated devices. If the battery is exchangeable, maybe try a fresh pack (batteries age with use)? The workaround seems obvious: Keep AC power plugged in while doing heavy GPU processing.
thanks for the answer njuffa!
Indeed I’ve seen that many people had this issue with desktop computers, but none of them mentioned laptops. Keeping the AC power plugged in is indeed a trivial workaround, but should I worry about damaging my motherboard or my videocard while doing this?
Given that these shutdowns occur to protect your hardware, I think that it is unlikely that you will damage the hardware. But sudden abnormal shutdowns may cause your file system to become corrupted, as the operating system isn’t shut down in the proper sequence. You really would want to tackle the root cause of the issue.
all rechargeable batteries that I am familiar with lose capability over time. If the GPU demand exceeds the power supply/battery capability, the laptop is generally designed to detect an “undervoltage” condition, and automatically shut itself off, immediately, because prolonged operation of electronics in an out-of-spec condition may be problematic.
If your laptop is new, you might consider discussing this with the manufacturer. If it’s older, then it may be something you simply need to live with, i.e. keep it plugged in when the GPU demand is heavy.
I agree with njuffa that the shutdowns should be avoided for OS/disk integrity.
thank you guys for the answers!
@txbob My laptop is brand new, so yes I think I’ll send some emails to Acer customer service to inquiry whether this is normal or not. I would say it is a bad design decision to equip a laptop with batteries which are not able to supply all its components at any stress level
@njuffa what you mean exactly by “attacking the root cause of the issue”? you mean, try to deploy a lighter load to the GPU (using less threads, for example)?
In time: is there any guideline to help me write better code so I won’t overload the GPU? I’ve been doing some tests using PyTorch and although I’ve trained networks with huge datasets (which would take 100% of GPU use easily), I haven’t experienced this problem. I guess they write better code so even with heavy tasks the GPU is able to work efficiently
By root cause I meant “insufficient power when laptop is in battery mode”.