Manually take Memory Dump?

davido · November 4, 2021, 4:22pm

Anyway currently to generate a snapshot dump manually for a given process using Cuda without intentionally causing an exception?

AKravets · November 8, 2021, 3:14pm

Hi @davido
There is a mechanism to trigger coredump generation manually:

Enabling user induced GPU core dump generation

For the devices that support compute preemption, the user can interrupt a running CUDA process to generate the GPU core dump.

Set the CUDA_ENABLE_USER_TRIGGERED_COREDUMP environment variable to 1 in order to enable generating a user induced GPU core dump. This option is disabled by default. Setting this environment variable will open a communication pipe for each subsequently running CUDA process. To induce the GPU core dump, the user simply writes to the pipe.

To change the default pipe file name, set the CUDA_COREDUMP_PIPE environment variable to a specific pipe name. The default pipe name is in the following format: corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of machine running the CUDA application and PID is the process identifier of the CUDA application. This environment variable can take % specifiers as decribed in the above section.

More details can be found in cuda-gdb documentation: CUDA-GDB (search for CUDA_ENABLE_USER_TRIGGERED_COREDUMP). To trigger the coredump you need to write anything to the pipe (not just open and close it).

Note that coredump generation operation is destructive, so your app will be killed in process. We don’t support taking snapshots while application is running yet.

davido · November 9, 2021, 6:41pm

Thank you for your response! Is there any way you can elaborate on this?
I have the environment variable CUDA_ENABLE_USER_TRIGGERED_COREDUMP set using both export CUDA_ENABLE_USER_TRIGGERED_COREDUMP = 1 as well as adding that environmental variable definition in my runtime configurations for the python process I am trying to debug. I attached to that python process by PID in cuda-gdb but it seems like there is no corepipe in my working directory(ie. corepipe.cuda.HOSTNAME.PID) and I cant write to that pipe.
An example would be greatly appreciated.

AKravets · November 10, 2021, 11:22am

Hi @davido,
The following sequence worked for me:

Run CUDA application (I used radixSortThrust CUDA sample modified to run in while (true) loop):

CUDA_ENABLE_USER_TRIGGERED_COREDUMP=1 ./radixSortThrust

Pipe file is created in current directory:

ls corepipe_akravets-dt_81675

Write 1 to that file: echo 1 > corepipe_akravets-dt_81706
Note that the CUDA process terminated

Sorting 1048576 32-bit unsigned int keys and values

zsh: abort (core dumped)  CUDA_ENABLE_USER_TRIGGERED_COREDUMP=1 ./radixSortThrust
% ls core_1636543265_akravets-dt_81706.nvcudmp

davido · November 15, 2021, 5:50am

Thanks! This helped me get farther than last time however I am still unable to reproduce what it happening here.

Starting any program, I do see the core pipe getting created now but writing to the pipe seems to do nothing. I have tried a few ways, including:

Running a program and throwing a while true loop in there, then writing to the pipe via echo 1 > corepipe_david .
Launching a program with Cuda GDB, printing out the address of some GPU data structure pointer (ie. 0xb012345678) then letting the program continue, catching in a while true loop then writing echo 1 > corepipe_david.
Running the radixSortThrust program with a while true loop inside of it (like in your test case). And writing echo 1 > corepipe_david while its caught in the loop.

None of these things worked on my setup, though getting to the point where the corepipe is made actually works now (only after changing display settings in bios to not have display automatically set to a device).

Any ideas? Ultimately I would like to be able to inspect a data structure I moved over to the GPU, so this would be useful to figure out.

AKravets · November 15, 2021, 10:04am

Hi @davido
Could you provide a bit more info about your setup?

The output of the nvidia-smi command.
Did you add the while(true) loop on host (so kernel is launched in a loop) or on GPU (so there is an infinite loop inside the kernel)?
If on host - are you running any kernels inside the loop body?

In my case, I have updated the radixSortThrust as follows:

    while(true)
    {
        if (checkCmdLineFlag(argc, (const char **)argv, "float"))
            bTestResult = testSort<float, true>(argc, argv);
        else
            bTestResult = testSort<unsigned int, false>(argc, argv);
    }

davido · November 15, 2021, 6:21pm

The while true loop was added in my test program on the host in this way:

	while(true)
	{
	// Converting image to grayscale...
		convertToGrayScale <<<dimGrid, dimBlock >>>(dev_pOut, dev_pIn, dev_consts, width, height, channels);
		printf("Looping...\n");
	}

Where convertToGrayScale is defined as:

__global__ void convertToGrayScale(float * pOut, float * pIn, float * consts, int width, int height, int channels) 
{
...
}

My program works, if I just let it run normally and I am trying to use this as a simple case to inspect a data structure I moved into GPU memory.

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1110 G /usr/lib/xorg/Xorg 32MiB |
| 0 N/A N/A 1806 G /usr/lib/xorg/Xorg 50MiB |
| 0 N/A N/A 2011 G /usr/bin/gnome-shell 81MiB |
| 1 N/A N/A 1110 G /usr/lib/xorg/Xorg 2MiB |
| 1 N/A N/A 1806 G /usr/lib/xorg/Xorg 2MiB |
±----------------------------------------------------------------------------+

That is my nvidia-smi output, though I am doing all of this through TTY on ubuntu as when I dont xorg seems to crash my display when I use cuda-gdb on anything. My cuda-gdb version is:

10.1 release
Portions Copyright (C) 2007-2019 NVIDIA Corporation
GNU gdb (GDB) 7.12

in case that helps, and my nvcc version is

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Thank you for helping so far!

Edit:
Not sure how Cuda actually catches writing to the pipe and generating that core dump, but should intentionally causing a bug within my cuda kernel invoke that core dump also being made? Maybe its an issue with communication through the pipe?

AKravets · November 17, 2021, 7:12am

Hi @davido,
The nvidia-smi output points to r495 driver (CUDA 11.5), but you mentioned, that you are actually using CUDA-10.1 - can you provide more details on that? Are you using libcuda.so.1 from CUDA 10.1 installation (e.g. by setting LD_LIBRARY_PATH to CUDA-10.1 libcuda.so.1 location?

Could you also run deviceQuery sample using your CUDA 10.1 setup and share the output?

Some context - there are multiple components involved here:

Kernel driver (detected as r495 by nvidia-smi)
libcuda.so - depending on your setup can be either from CUDA 11.5 or CUDA 10.1 - need to confirm that
Tools (compiler, cuda-gdb) - tools from CUDA 10.1 are used.

Topic		Replies	Views
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2244	February 15, 2011
Error in my code... CUDA Programming and Performance	11	2531	December 19, 2014
nvprof core dumps on Ubuntu 16.04 CUDA Setup and Installation	12	3559	August 16, 2018
how to use cuda-gdb core dump CUDA Programming and Performance	5	3722	December 28, 2023
Cuda-gdb script execution CUDA-GDB	6	1858	January 29, 2024
Linux CUDA kbuntu/ubuntu 11.10 CUDA Programming and Performance	13	101331	November 25, 2011
Problems with CUDA 9.1 in Ubuntu 16.04 CUDA Setup and Installation	36	24288	May 15, 2018
Compiling and setting up CUDA libraries on windows 10 CUDA Setup and Installation	6	21249	August 14, 2018
Cannot attach crashing down process via cuda-gdb CUDA-GDB	18	2272	November 16, 2022
Install Problem CUDA Programming and Performance	32	12681	December 17, 2009

Manually take Memory Dump?

Enabling user induced GPU core dump generation

Related topics