How to make a GPU crash reporting?

Samuel_Dong · March 14, 2019, 8:43am

hi, there,
I’m a new Cuda developer,
I wonder is there a elegant way to collect GPU crash reports ?
In CPU, there are many mature toolkit such as Google breakpad and others, after we publish a PC application to our customers , we can know the crash reason by dumping crash logs.
But in Cuda, I found that some memory issue can’t be catch by the debugger and no logs.
For example:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

__global__ void addKernel(int *c, const int *a, const int *b)
{

	int i = threadIdx.x;
	c[i] = a[i+1000000] + b[i+1000000] + threadIdx.x;
	printf("%d\n", c[i]);
}

int main()
{

	cudaError_t cudaStatus = cudaSuccess;

	cudaDeviceProp prop;
	cudaGetDeviceProperties(&prop, 0);

	printf("%s\n", prop.name);
	printf("prop.major = %d\n", prop.major);
	printf("prop.minor = %d\n", prop.minor);
	printf("prop.managedMemory = %d\n", prop.managedMemory);

	int *a, *b, *c;

	addKernel << <1, 10 >> > (c, a, b);
	cudaStatus = cudaGetLastError();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "addKernel launch failed!");
		return 1;
	}

	cudaStatus = cudaFree(a);
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaFree failed!");
		return 1;
	}

	cudaStatus = cudaFree(b);
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaFree failed!");
		return 1;
	}

	cudaStatus = cudaFree(c);
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaFree failed!");
		return 1;
	}

	return 0;
}

It will cause the program crash without any useful call stack info.
I have tried Nvidia Nsight Graphics/Nsight Cuda debugging(VS plugin), found nothing.
Any one can help??

saulocpp · March 14, 2019, 10:40am

Search for “cuda proper error checking”.
Then change the macro (my signature) if you want to print the message to screen or to a file.

Samuel_Dong · March 14, 2019, 12:06pm

Thanks, I have already use this method in my code, I just wonder if there is a more elegant way to do this.

Robert_Crovella · March 14, 2019, 1:59pm

Try googling CUDA_ENABLE_COREDUMP_ON_EXCEPTION

it has various limitations

Even without that, you will get a call stack of sorts if you run your program from the command line with cuda-memcheck (you can google that also). You can get something similar from within visual studio by enabling memory checking

Samuel_Dong · March 14, 2019, 2:39pm

But I can’t run it in my customers’ computers…

Topic		Replies	Views
Debugging CUDA Programming and Performance	2	592	May 5, 2016
Illegal memory access crash CUDA Programming and Performance	15	4275	January 30, 2022
CUDA debugging issues CUDA Programming and Performance	3	2818	March 27, 2008
VS2022 + Nsight memory checker does not work Nsight Visual Studio Edition	4	680	November 15, 2023
How to find leaks? cuda-gdb runs out of memory, but compute-sanitizer runs without erros CUDA-GDB	9	3687	March 22, 2023
Cuda Debugging CUDA Programming and Performance	3	801	January 31, 2016
Help catching an illegal memory access CUDA Programming and Performance decoder , cuda , debugger	14	103	November 7, 2024
A problem when a new hand in cuda programing CUDA Programming and Performance	6	695	July 30, 2015
Cuda cannot find my graphic card? CUDA Setup and Installation	5	2388	April 9, 2019
Kernel crash when GPU Debug Info is disabled in Visual Studio CUDA Programming and Performance	5	962	March 12, 2018

How to make a GPU crash reporting?

Related topics