From reading the documentation, I have tried to set CUDA_ENABLE_COREDUMP_ON_EXCEPTION to 1 by typing this in the terminal outside of cuda-gdb:
export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1
Then I have opened the program in cuda-gdb and ran the program. It hit a SIGINT caused by assert(false). However, there is no message about any core dump being created. also, I don’t know where the file is supposed to be if it were created.
Is how I enabled core dump correct?
is SIGNIT not supposed to generate core dump?
where would the file be, and how can I change the default path fo the file?
Conceptually the process is at a high level similar to how you would use an “ordinary” CPU coredump.
You enable coredump
You run your program normally (not in cuda-gdb)
your program hits some kind of coredump fault (and exits, depositing a coredump file on disk)
you then start up cuda-gdb
you don’t open your own program, but instead you open the coredump file
How do you know you got the program to produce coredump if you can’t locate the dumped file?
Anyway, none of this seems to be obscure. Here’s a full test case:
$ export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1
$ cat t365.cu
__global__ void k(int *d){
int *x = NULL;
*d = *x;
}
int main(){
int *data;
cudaMalloc(&data, sizeof(int));
k<<<1,1>>>(data);
cudaDeviceSynchronize();
}
$ nvcc -o t365 t365.cu
$ ls core*
ls: cannot access core*: No such file or directory
$ ./t365
Message from syslogd@dc11 at Dec 31 01:31:38 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#18 stuck for 23s! [t365:7703]
Aborted (core dumped)
$ ls core*
core_1546237863_dc11.dc.nvidia.com_7688.nvcudmp
$
I see instructions on changing the name of the coredump file in the documentation. The path is the same path as your executable uses. I don’t see instructions to change the default coredump path to something other than the path to your executable.
This seems very straightforward to me. I’m not sure what the issue is.