Streaming CUDA core dump into the pipe

Hello,

I’m building a service that runs on a server with many GPU jobs and collects core dumps that are produced by them. Since the HDDs of this server are heavily loaded by the aforementioned jobs, I’d rather not materialize these coredumps on disks. Instead I’d prefer to write coredumps into the pipe and stream them to a remote storage.

Unfortunately, it seems that CUDA library is unable to write coredumps into the pipe. When I set CUDA_COREDUMP_FILE environment variable to the path of my pipe, only first 64 bytes of the coredump are sent.

After a small research with strace I found out that CUDA library calls ftell function of a file descriptor of coredump file. This function returns -1 for pipes and after that program terminates. I’ve implemented a custom version of ftell that counts the number of bytes written into the pipe using LD_PRELOAD mechanism and this allowed me to obtain an almost valid coredump (the only difference is that first 64 bytes of the coredump are located at the end of the file, with one possible explanation being that CUDA library does fseek till the beginning of a coredump file when writing ELF header).

However, this custom approach seems totally unreliable. Is it possible to fix it in CUDA library?

Best regards,
Grigory Reznikov.

1 Like

Hi! Thank you very much for the report and the detailed investigation. We are actively working on addressing this issue, so it will be fixed in one of the upcoming releases.

I will provide another update in this post when the fixed CUDA GDB version is released.

Hi!

Coredump streaming to pipe has been fixed in CUDA toolkit 11.4 CUDA GDB release (current available at https://developer.nvidia.com/cuda-toolkit).

Great news, thank you very much!