CUDA freezes computer

narcis · May 6, 2009, 2:51pm

Hi.

I’m programming a CUDA kernel to compute something related to computational geometry from a set of points in the plane. The program divides the plane into a two-dimensional grid (as a sort of window with pixels). For each of these “pixels” and inside the CUDA kernel we have to do a double loop over all the points in our initial set of points, so the algorithm has a nÂ² complexity where n is the number of points.

The CUDA kernel looks something like that:

__global__ void CUDA_kernel(float* points_list, int number_of_points, float* result)

{

	for(int i=0;i<number_of_points;i++)

	{

		for(int j=i+1;j<number_of_points;j++)

		{

			(do something);

		}

	}

	result[index]=something;

}

The problem comes (I only suppose) when n grows and this double loop cause the total amount of instructions inside the CUDA kernel exceeds a determined number (I can see in the reference manual that a CUDA kernel can have, at most, 2 million of native instructions). The PC freezes completely…I can still move the mouse but I can do nothing except reset the machine.

I was thinking that the problem is the number of instructions, so I divide the CUDA kernel in more little loops, something like this

__global__ void CUDA_kernel(float* points_list, int number_of_points, float* result,int partition,int elements_in_partition)

{

	int begin=partition*elements_in_partition;

	for(int i=begin;i<begin+elements_in_partition;i++)

	{

		for(int j=i+1;j<number_of_points;j++)

		{

			(do something);

		}

	}

	result[index]=something;

}

.....

elements_in_partition=something;

for(int partition=0;partition<number_of_partitions;partition++)

{

	  CUDA_kernel<<<block_dim,grid_dim>>>(...,partition,elements_in_partition);

}

....

But, surprisingly, PC still freezes. Do you think that the problem is the number of instructions inside CUDA kernel? Is this could be the problem, why with an inferior number of points (n) the program runs well (even with a greater number of instructions than in the second version posted here)?

Can be anything else? Have you any idea? :).

Thanks in advance.

Jamie_K · May 7, 2009, 8:00am

I highly doubt it is the number of instructions. It is very likely that you’re running into the watchdog timer of the OS.

Search the forums for watchdog timer for more information.

narcis · May 7, 2009, 2:30pm

I’ve read about the watchdog timer in your link, but in these problems I see people saying that their CUDA kernels give an error past a certain number of seconds…I dont have any error message…I only get a freezed computer which I have to reset totally.

narcis · May 7, 2009, 2:31pm

I’ve read about the watchdog timer in your link, but in these problems I see people saying that their CUDA kernels give an error past a certain number of seconds…I dont have any error message…I only get a freezed computer which I have to reset totally.
Moreover I am using Linux, and I launh the program from a console.

_Big_Mac · May 8, 2009, 10:18am

While the watchdog should give a nice error message, more often than not it bluescreens your computer. At least that how it works with pre 185 drivers, haven’t checked them yet.

narcis · May 11, 2009, 3:47pm

I don’t get error message or bluescreen. I only get the computer totally freezed, like the CUDA kernel was computing. The problem is that the CUDA kernel never “ends” and the computer is always freezed (with the image freezed, keyboard input freezed…) until I reset it. The time until it freezes varies from a PC to another one, so I suspect it is not related to the computing time.

I’m using the 180.22 drivers and an OpenSuSe 10.2 on a GeForce GTX 280

_Big_Mac · May 12, 2009, 10:41am

Well in my cases it would freeze totally for 10+ seconds and then bluescreen.

Have you tried the new drivers?

Mathieu_Lamarre · May 12, 2009, 4:05pm

Reduce the size of your problem and run in debug and emulation (-deviceemu -D_DEVICEEMU /Od etc.)

A simple access violation inside your kernel could cause this problem. Emulation with debug information should detect the source of the problem.

I triggered the exact same problem on XP32 by voluntarily reading beyond a global memory buffer. Everything freezes except the mouse can move a little.

tonhead · November 30, 2009, 9:55am

What if I double checked my kernel for such errors? I guess the problem can be caused by kernel taking too much resources…

ori.kushnir · November 8, 2018, 11:03pm

The kernel from the tutorial freezes my computer as follows:

When executed, the process hangs. If I try to open task manager and kill it, the computer freezes. If I don’t the computer freezes after a number of minutes. This is using GTX960M, Windows 10, latest driver.

#include
#include <math.h>
// Kernel function to add the elements of two arrays
global
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}

int main(void)
{
int N = 1<<20;
float *x, *y;

// Allocate Unified Memory – accessible from CPU or GPU
cudaMallocManaged(&x, Nsizeof(float));
cudaMallocManaged(&y, Nsizeof(float));

// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}

// Run kernel on 1M elements on the GPU
add<<<1, 256>>>(N, x, y);

// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();

// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "Max error: " << maxError << std::endl;

// Free memory
cudaFree(x);
cudaFree(y);

return 0;
}

banex19 · November 8, 2018, 11:11pm

The kernel from the tutorial freezes my computer as follows:

When executed, the process hangs. If I try to open task manager and kill it, the computer freezes. If I don’t the computer freezes after a number of minutes. This is using GTX960M, Windows 10, latest driver.

include
include <math.h>
// Kernel function to add the elements of two arrays
global
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}

int main(void)
{
int N = 1<<20;
float *x, *y;

// Allocate Unified Memory – accessible from CPU or GPU
cudaMallocManaged(&x, Nsizeof(float));
cudaMallocManaged(&y, Nsizeof(float));

// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}

// Run kernel on 1M elements on the GPU
add<<<1, 256>>>(N, x, y);

// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();

// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "Max error: " << maxError << std::endl;

// Free memory
cudaFree(x);
cudaFree(y);

return 0;
}

I seem to have a similar problem: Whole system freezes when using cudaMallocManaged - CUDA Programming and Performance - NVIDIA Developer Forums

Robert_Crovella · February 5, 2019, 12:08am

Driver 418.81 was released today and may help with this issue. You may wish to try it.

Topic		Replies	Views
Using unified memory causes system crash CUDA Programming and Performance	28	5887	February 4, 2019
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1788	November 4, 2016
GPU/CPU precision comparison and Kernel instructions question CUDA Programming and Performance	5	680	April 4, 2017
CUDA Kernel Crash CUDA Programming and Performance	13	4640	January 8, 2018
CUDA limit for loops..? too large number of iterations? CUDA Programming and Performance	28	27378	March 20, 2008
CUDA kernels keep on crashing CUDA Programming and Performance	6	3646	October 27, 2008
CUDA very slow performance CUDA Programming and Performance	21	16754	March 6, 2020
kernels timeout or hang intermitently CUDA Programming and Performance	9	3732	July 25, 2013
Can a Kernel be too big?? CUDA_ERROR_NO_BINARY_FOR_GPU error 209 CUDA Programming and Performance	11	3047	November 13, 2017
The Cuda 5 Second execution-time limit Finding a the way to work around the GDI timeout CUDA Programming and Performance	24	12731	July 26, 2010

CUDA freezes computer

Related topics