Cuda application crashes

_Andy1 · February 3, 2011, 6:04pm

Hello,

I’m not very experienced in Cuda and now I’ve a programm which crashes but I can’t understand why.

The code doesn’t make much sense any more, because I changed some parts in order to locate the mistake.

#define ANZAHL_GAUSE 4

#define ANZAHL_FELDER 1460

#define PI 3.141592654f

#define ANZAHL_SCHRITTE 50000

#define dt 0.005f

#define t0 -125.0f

struct Feld

{

	float widthSQ2[ANZAHL_FELDER];

	float amplitude[ANZAHL_FELDER];

	float freq[ANZAHL_FELDER];

	float phase[ANZAHL_FELDER];

};

Feld Daten[ANZAHL_GAUSE];

texture<float> d_Daten;

float EFeld_Ausgabe[ANZAHL_SCHRITTE*ANZAHL_FELDER];

float d_EFeld_Ausgabe[ANZAHL_SCHRITTE*ANZAHL_FELDER];

__device__ float d_EFeld(float t, short index)

{

	float E = 0.0f;

	float Wert;

	for (int i=0; i< ANZAHL_GAUSE; i++)

	{

		Wert = tex1Dfetch(d_Daten, i*sizeof(Feld)/sizeof(float)+2*ANZAHL_FELDER+index)*t+

				tex1Dfetch(d_Daten, i*sizeof(Feld)/sizeof(float)+3*ANZAHL_FELDER+index);

		E += __expf(-t*t/tex1Dfetch(d_Daten, i*sizeof(Feld)/sizeof(float)+index))*

				tex1Dfetch(d_Daten, i*sizeof(Feld)/sizeof(float)+ANZAHL_FELDER+index)*__cosf(Wert);

	}

	return E*0.001f;

}

__global__ void kernel(float* Pointer)

{

	short tid = threadIdx.x + blockIdx.x * blockDim.x; //Nummer des Feldes

	float t = t0;

	float dummy;

	if(tid < ANZAHL_FELDER)

	{

		for(int i = 0; i < ANZAHL_SCHRITTE; i++)

		{

			dummy = d_EFeld(5000.0f, tid);

			Pointer[0] = dummy*t;

			Pointer[tid+i*ANZAHL_FELDER] = 5.0;

			t+=dt;

		}

	}

}

---------------------------------------------

	float* dev_E; //Die berechneten Ergebnisse

	float* dev_Data; //Die Gause - wird an eine Textur gebunden!

	cutilSafeCall(cudaMalloc( (void**)&dev_Data, sizeof(Feld)*ANZAHL_GAUSE));

	cutilSafeCall(cudaMemcpy(dev_Data , Daten, sizeof(Feld)*ANZAHL_GAUSE,cudaMemcpyHostToDevice));

	cutilSafeCall(cudaBindTexture( NULL, d_Daten, dev_Data, sizeof(Feld)*ANZAHL_GAUSE));

	cutilSafeCall(cudaMalloc( (void**)&dev_E, sizeof(float)*ANZAHL_FELDER*ANZAHL_SCHRITTE));

	kernel<<<ANZAHL_FELDER, 1>>>(dev_E);

	cutilSafeCall(cudaMemcpy(d_EFeld_Ausgabe , dev_E, sizeof(float)*ANZAHL_FELDER*ANZAHL_SCHRITTE,cudaMemcpyDeviceToHost));

	cutilSafeCall(cudaFree( dev_E ));

	cutilSafeCall(cudaUnbindTexture( d_Daten ));

	cutilSafeCall(cudaFree(dev_Data));

I’m using a GeForce GT 8800 and Visual Studio 2008. The depicted Code crashes - I get a black screen for a few seconds while the driver restarts.

I get the message “e:/cppIntegration/cppIntegration.cu(190) : cudaSafeCall() Runtime API error : the launch timed out and was terminated.” in the console.

Line 190 is:

cutilSafeCall(cudaMemcpy(d_EFeld_Ausgabe , dev_E, sizeof(float)*ANZAHL_FELDER*ANZAHL_SCHRITTE,cudaMemcpyDeviceToHost));

This doesn’t help me - why is there a timeout - the programm ran only a few seconds. I changed the code in the kernel slightly and the effect was that the application doesn’t crash anymore:

for(int i = 0; i < ANZAHL_SCHRITTE; i++)

{

	dummy = d_EFeld(5000.0, tid);

	Pointer[0] = dummy*t;

	//Pointer[tid+i*ANZAHL_FELDER] = 5.0;

	t+=dt;

}

or

for(int i = 0; i < ANZAHL_SCHRITTE; i++)

{

	dummy = d_EFeld(5000.0, tid);

	Pointer[0] = dummy;//*t;

	Pointer[tid+i*ANZAHL_FELDER] = 5.0;

	t+=dt;

}

…and so on; there are some more configurations (e.g. changing the 5000.0 to 0.0)

I don’t understand why these changes prevent the crashing - e.g. the last code does only replace Pointer[0] = dummy*t;

with Pointer[0] = dummy;//*t;

Do you know what I’m doing wrong?

Thank you very much!

Andy

tera · February 4, 2011, 1:28am

Your kernel did not actually crash. It is as the message says: As the screen does not get updated while a kernel runs, there is a watchdog timer to protect against runaway kernels that would render your system unusable because of the frozen screen (even though everything is still running).

Possible solutions are splitting the kernel into multiple smaller kernels (conveniently done by just invoking with a smaller grid size) so that the screen gets a chance to update in between (which also resets the watchdog timer), optimizing the kernel so that it finishes before the watchdog timer expires, of setting the watchdog timeout to a larger value (not recommended).

I see that you launch the kernel with just a single thread per block, which wastes 31 of the 32 threads per warp. A simple optimization would be to use a block size of 32 (and share the work in a useful way between them), which might already be enough to avoid triggering the watchdog timer altogether.

_Andy1 · February 5, 2011, 1:03pm

Thank you very much!!

I didn’t know that CUDA-Kernels aren’t allowed to run longer than 5 seconds (on windows).

Yes, I usually use 256 threads / block but I just wanted to see how the performance decreases if I use less threads / block. The worst case (with only one thread per block) was aborted and I didn’t unstand why - because the limit of the grid was not reached.

So I’ve to take care that long calculations are splitted into several kernel invocations.

Topic		Replies	Views
Why does this programm crash CUDA Programming and Performance	3	1895	March 12, 2009
CUDA kernels keep on crashing CUDA Programming and Performance	6	3680	October 27, 2008
application crash CUDA Programming and Performance	3	18891	June 28, 2011
Too much threads makes computer crashing If this kernell takes a long time to complete, I got a blue CUDA Programming and Performance	7	2050	April 24, 2009
Another crash CUDA Programming and Performance	5	5477	July 12, 2007
Cuda timeout and crash CUDA Programming and Performance	1	918	July 17, 2009
the launch timed out and was terminated. CUDA Programming and Performance	6	23912	June 29, 2010
Bluescreen while running CUDA kernel CUDA Programming and Performance	5	7731	July 8, 2009
the launch timed out and was to terminated? CUDA Programming and Performance	3	1164	April 3, 2009
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1822	November 4, 2016

Cuda application crashes

Related topics