I keep encountering a really disturbing issue with the use of NVidia’s GPUs for all of my 64-bit Windows GPGPU apps, so I thought that now would be as good a time as any to ask about it.
The issue is twofold.
First, I’ve verified that while the NVidia GPU is executing a kernel, the execution speed of the CPU slows down by a whopping 61 percent !!
Second, while the NVidia GPU is executing a kernel, the Windows GUI is completely unresponsive.
Now, I know what you’re thinking - that this is obviously some king of bus contention issue, right?
Well, no, actually. That’s what I thought for the longest time too. But the application where this is most painfully obvious is one in which 99.9 percent of the CPU’s execution time is spent processing only its own CPU registers - no memory access required. Furthermore, the single GPU kernel is also spending 99.9 percent of its time processing only its own GPU registers.
So 99.9 percent of the time, both the CPU and GPU are accessing only their own respective registers.
So why should the GPU be so detrimentally interfering with the performance of both the CPU, and the Windows GUI?
I can understand if one or more of the GPU’s registers are actually implemented (under the hood) as GPU memory, but that still wouldn’t explain the extreme CPU slowdown, because the CPU, for all intents and purposes, isn’t accessing any memory at all. And I’m not even sure it would adequately explain the detrimental influence on the Windows GUI, because the GPU shouldn’t be using the system bus to access its own memory anyway (right?)…
I also, at one point, thought that the running GPU kernel is clobbering the Windows GUI because the Windows GUI is waiting on DirectX which is, in turn, waiting on an open slot in the GPU. So I changed the program code to only use three quarters of the available (concurrent) GPU threads. But that didn’t change anything…
This isn’t a WDDM issue either. I’ve turned off the WDDM GPU timeout by setting the following registry values:
HKLM\System\CurrentControlSet\Control\GraphicsDrivers\TdrLevel = 0, and
HKLM\HARDWARE\DEVICEMAP\VIDEO\MaxObjectNumber = 0,
the latter of which is only necessary for Windows 64-bit, which is what I’m running…
Speaking of which, this is what I’m currently using:
Microsoft Windows 7 Professional 64-bit,
Intel CPU: Dual Core “i5” running @ 2.4 GHz
NVidia GPU: GeForce GT 525M
NVidia Optimus: installed
Installed RAM: 8 GB
And the specs for the aforementioned app is:
Windows 64-bit C++ app (w/ some assembler),
Framework used: none (“native” Windows API only),
Compiler: Microsoft Visual C++ 2010, version 10.0.40219.1 SP1Rel,
NVidia Driver: nvcuda.dll, version 337.88,
NVidia Interface used: NVidia Driver API only.
So is there something I can do to either mitigate or circumvent this awful situation? Am I missing some critical piece of information? Should my program be doing something it isn’t? Anyone?