NDA expiration - new GF100 information

CapJo · January 19, 2010, 1:53pm

I’m aware that you can have also “silent segmentation faults” on the CPU. That’s why I wrote not malloced data on the GPU. When you access not allocated data on the CPU you will get always a segmentation error, thats the difference.

Doing error checking by hand is an option, but you must assume that your error checking code is absolutely correct. For “static algorithms” whose control flow does not depend on user input you will get all errors, but you have a problem when your execution of your algorithm is influenced by the user input. You can’t statically check for errors.

The best option would be to do a kind of memory protecten in hardware or at least by the debugger.

NVIDIA is trying to make the devolopment easier and the C++ support show that they want making CUDA devolopment near as easy as on the CPU. A reasonable step would be hareware debugging support and memory protection. All I want is information about that.

Thank you for your replies and your suggestions!

jack · January 19, 2010, 1:59pm

http://www.youtube.com/watch?v=iUouQy7Ohus

Also, if you’re just worried about your end product (the medical device), then I’d run a stripped-down Linux distribution on an Ion-based board. So you get stability, low power usage, and CUDA acceleration (not much, but maybe enough for what you’re doing). Another option is to build your device normally, then host some compute-only servers with Teslas or whatever in the building, and send the data there for computation.

In any case, the debugging tools available now for CUDA (those provided by nVidia, and third-party emulators like Ocelot and barra) should let you write code that is rock-solid and isn’t going to crash the system. The majority of the bugs I’ve seen people have (or had myself) had to do with out-of-bounds memory accesses, which you can find pretty easily with valgrind (or one of the emulators above). Also, for maximum stability, run some console-only Linux distribution…the next biggest group of crashes involves kernels that run too long for the display watchdog and cause the driver to reset.

CapJo · January 19, 2010, 3:02pm

http://www.youtube.com/watch?v=iUouQy7Ohus

Also, if you’re just worried about your end product (the medical device), then I’d run a stripped-down Linux distribution on an Ion-based board. So you get stability, low power usage, and CUDA acceleration (not much, but maybe enough for what you’re doing). Another option is to build your device normally, then host some compute-only servers with Teslas or whatever in the building, and send the data there for computation.

In any case, the debugging tools available now for CUDA (those provided by nVidia, and third-party emulators like Ocelot and barra) should let you write code that is rock-solid and isn’t going to crash the system. The majority of the bugs I’ve seen people have (or had myself) had to do with out-of-bounds memory accesses, which you can find pretty easily with valgrind (or one of the emulators above). Also, for maximum stability, run some console-only Linux distribution…the next biggest group of crashes involves kernels that run too long for the display watchdog and cause the driver to reset.

This youtube video shows an internal windows 7 bug and this crash is probably caused by operting system functions. I never said Windows can’t crash, but try to write a matrix multiplication that will crash your PC using only user mode commands and a certain amount of memory allocations. You won’t be able to do that on the CPU.

The advantage of GPU computing compared to an additional application acceleration device (TESLA Card, CELL Card) is that you have already this device in your PC for 3D graphics output like volume rendering. So it makes sense to use it also for acceleration purposes without extra costs.

Using an Atom processor on a ION plattform would decrease general peformance and a Core 7i would be probably much faster than using CUDA cores on the ION GPU. Addtional TESLA servers will rise the costs and copying big data sets over a network connection will be slow. It’s already a performance problem to copy data from PC RAM to GPU. What benefit would I get? The windows application I’m talking about already exists and has grown for many years. Switching to Linux is therefore impossible.

Due to these constraints I have to stay with Windows and I have only one GPU. In future it will be a Fermi card.

The debuging support on windows is very limited (emulation mode). Unfortunately Valgrind is only available for Linux, but I will install a virtual machine and do some error checking with Valgrind. This solution is however complicated and time consuming. Of course it is possible, but the productivity decreases.

Is it with NEXUS now possible to debug on a PC with a single device?

Fermi will be able to run multiple kernels so the problem arises that one kernel can overwrite another. This must be handeld somehow.

_Big_Mac · January 19, 2010, 10:33pm

IIRC memory is protected on the GPU and the addresses are virtual. A GPU segfault should never bring the system down.

I know it has happened, at least on older drivers. Is it still an issue? Does someone have a minimal repro that hangs the system?

I’ve actually just written an app in which I intentionally go beyond array bounds. Kernel invocation returns an error (caught by cudaGetLastError) and either the screen blips momentarily or it goes black for a couple of seconds to return after a while with Windows saying that the driver has stopped responding and successfully recovered. I can relaunch the application after this.

This is on Windows 7 x64. I know Windows XP couldn’t handle a driver crash gently but it seems it’s not that bad now. It’s still nasty that a kernel error can bring the driver down…

jack · January 19, 2010, 10:45pm

Big_Mac is right…the address spaces are virtual, and allocated per-context, so two (or more) concurrent kernels shouldn’t be able to write into each other’s memory.

CapJo · January 19, 2010, 11:40pm

During the time I a devloped a volume segmentation algorithm I had a number of PC crashes. The last one occured on Monday. The desktop freezes and rarely I have strange colors on my screen. Then I have to reset my PC, but it also happend that after waiting for some minutes the PC started working again. This is all unpredictable. Starting the same kernel multiple times can lead either to complete crash or to receiving the error message “Unspecified launch failure”.

I observed this behavior on Windows XP x64 with Quadro Drivers. Might this problem be caused by the combination of driver and operating system?

tmurray · January 20, 2010, 12:26am

If you’ve got something that can reliably crash your machine with the latest drivers (196.21 on Windows now, I guess), then you should post a repro case.

Sarnath · January 20, 2010, 3:54am

Not really… It all boils down to page granualarity.

Page protection has a minimal granualarity of 4K (max gran of 4MB on intel). Say you have a static array “int array[100]” in your data-segment. If we assume “array” is page-aligned then you can still access the entire 4K without getting any faults on the CPU.

I am SURE GPU has memory protection. I have written kernels probing memory using arbitrary addresses and have seen segfaults… It may be possible that GPU memory pages are “huge” and hence lot of scope for silent faults…

Since GPU addresses are context-based, this should NOT affect other contexts… If you are having such a problem, it should be reported as Tim suggested.

CapJo · January 20, 2010, 10:51am

Win XP x64 SP 2

GPU: Quadro FX 4800

Quadro Driver 191.00

CUDA Version: 2.3

[attachment=15430:sysinfo.jpg]

This Kernel crashes my system reliably.

#include <stdio.h>

#include <cuda_runtime.h>

#include <limits.h>

__global__ void

killGPU(float* f)

{

	for(int i=0; i > INT_MIN; i--)

		f[-i] = 0;

}

int main(int argc, char** argv)

{

	cudaError_t status;

	float* d_data;

	status = cudaMalloc((void**) &d_data, 1024);

	if(status != cudaSuccess)

	{fprintf(stderr, "%s\n", cudaGetErrorString(status));}

	for(int i= 0; i < 100; i++)

	{

		killGPU<<<256, 512>>>(d_data);

		status = cudaGetLastError();

		if(status != cudaSuccess)

		{fprintf(stderr, "%s\n", cudaGetErrorString(status));}

	}

	status = cudaFree(d_data);

	if(status != cudaSuccess)

	{fprintf(stderr, "%s\n", cudaGetErrorString(status));}

}

Sarnath you are right, with your example of page protection. Maybe it’s an issue of the Quadro driver. Now I will install the latest Geforce driver an test if the crash still occurs.

CapJo · January 20, 2010, 12:25pm

It seems to be a Quadro driver issue , at least with the version (191.00) I used. This was the latest Quadro driver at the time when I installed CUDA 2.3.

With the latest Geforce driver 196.21 I wasn’t able to crash my system. It freezes for some seconds, but recovers after that.

This makes things much better. I thougt it was a general problem.

The only thing missing is a better windows debug support on the device, but Nexus will do that?

CapJo · January 20, 2010, 2:40pm

Now I have tested the Geforce driver 196.21. Until yet I haven’t manged to crash the PC, but after the recovery the driver seems to be not working correctly.

FurMark 1.65 (and my application) show a broken display output.

Here are the screens:

Before starting my killGPU kernel

[attachment=15431:fur_mark…recovery.jpg]

After starting my killGPU kernel

[attachment=15432:fur_mark…recovery.jpg]

I have tested the latest Quadro driver also 191.78 and it crashed my PC. Not in every run, but after some runs of the test kernel.

Cygnus_X1 · January 20, 2010, 3:19pm

For time to time, (like once every 100 crashes) I see similar artifacts ;)

CapJo · January 20, 2010, 4:18pm

It probably depends what you have overwritten with you kernel. I tried it several times and I have always this artifacts.

_Big_Mac · January 20, 2010, 7:39pm

I can confirm that it gives me artifacts even on Windows 7. It doesn’t kill the system though, I get the popup saying the driver died and was restored. I get all kinds of random pixels on my desktop afterwards and it seems the only way to clear it is a reboot. Good catch!

Could we have a comment on this by NVIDIA?

tmurray · January 20, 2010, 9:40pm

Gave this a quick try on a newer driver on my Server 2008 machine with a G84 and a GT200, and while the G84 eventually timed out there was still no display corruption. I’ll poke around a bit more later, but at the moment I haven’t been able to repro it.

CapJo · January 21, 2010, 10:04am

With Windows XP x64 I get no general display corruption in Windows desktop, but only when I start a 3D application (FurMark for example). The display corruption in Windows 7 comes probably from the use of Aero which uses 3D. This is my guess.

_Big_Mac · January 21, 2010, 10:20am

Indeed, I use Aero.

SPWorley · January 21, 2010, 9:01pm

Another good article came out… it includes more details but also more speculation.

The eye-opening quote from page 5 this article is that the consumer Fermi may have its DP throughput reduced by 75%… the DP powerhouse would be reserved for Tesla.
This is unconfirmed… first I’ve seen of it anyway.

seibert · January 21, 2010, 9:16pm

Sad, but if true, I’m not surprised. I’m certain the GeForce has cannibalized Tesla sales for workstation customers. Double precision is a reasonable way to segment the market, since the majority of the GeForce market doesn’t care about double precision anyway. (Although I would argue with the author that more than just hackers in Eastern Europe are bummed about this. Not all of have the budget to spend $2k per card when we would like a dozen of them.)

The only bonus would be if this increases the yield of chips going to the GeForce (perhaps by allowing chips with defective double units to be sold) and therefore help lower the price. If this turns out to be purely a limitation enforced in firmware with no yield benefits, then that will be very depressing.

Cygnus_X1 · January 22, 2010, 1:42am

I use Windows 7 64bit but I have my Aero disabled and it looks more or less like my old Windows 98. (I like simple rectangular windows without any fancy stuff). Nevertheless I believe that some basic 3D system might be still online, even if not really used…

Topic		Replies	Views
CUDA 4.0 CUDA Programming and Performance	63	507401	March 28, 2013
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11416	December 12, 2009
Fermi? Sounds interesting... CUDA Programming and Performance	58	15539	October 18, 2009
CUDA Toolkit 3.0 released CUDA Programming and Performance	62	26092	September 21, 2010
Unofficial Kepler Slides from Random Gamer Site Yeah, yeah, but we only have another week to rumor-m CUDA Programming and Performance	63	10354	April 5, 2012
CUDA Toolkit 3.0 update GPU HW debugging tools to replace device emulation CUDA Programming and Performance	44	29470	April 29, 2010
Is nvidia forcing SP compute customers into expensive cards? Why is SP Cuda so slow on gtx680? Somet CUDA Programming and Performance	49	13209	May 20, 2012
Attention Lucky GTX 480/GTX 470 Owners! Please run some benchmarks for us. :) CUDA Programming and Performance	88	22413	May 5, 2010
CUDA very slow performance CUDA Programming and Performance	21	16802	March 6, 2020
Fermi architecture details where can I find them? CUDA Programming and Performance	16	4029	April 8, 2012

NDA expiration - new GF100 information

Related topics