Platform: RTX 5090
export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1
then run my command:
coredump generated, but it’s corrupted:
cuda-gdb
`target cudacore core_1766111917_9180be2fb883_1125.nvcudmp`
Opening GPU coredump: core_1766111917_9180be2fb883_1125.nvcudmp
Failed to read core file: elfGetSectionHeaderStrTblIdx() failed: Section offset out of ELF image bounds
Coredump file is small:
-rw-r--r-- 1 root root 261M Dec 19 10:38 core_1766111917_9180be2fb883_1125.nvcudmp
tail of coredump progress:
[10:43:17.307094] coredump: SM 154/170 is not used by any context
[10:43:17.307101] coredump: SM 155/170 is not used by any context
[10:43:17.307107] coredump: SM 156/170 is not used by any context
[10:43:17.307114] coredump: SM 157/170 is not used by any context
[10:43:17.307120] coredump: SM 158/170 is not used by any context
[10:43:17.307126] coredump: SM 159/170 is not used by any context
[10:43:17.307131] coredump: SM 160/170 is not used by any context
[10:43:17.307136] coredump: SM 161/170 is not used by any context
[10:43:17.307143] coredump: SM 162/170 is not used by any context
[10:43:17.307148] coredump: SM 163/170 is not used by any context
[10:43:17.307153] coredump: SM 164/170 is not used by any context
[10:43:17.307159] coredump: SM 165/170 is not used by any context
[10:43:17.307165] coredump: SM 166/170 is not used by any context
[10:43:17.307171] coredump: SM 167/170 is not used by any context
[10:43:17.307177] coredump: SM 168/170 is not used by any context
[10:43:17.307181] coredump: SM 169/170 is not used by any context
[10:43:17.307185] coredump: SM 170/170 is not used by any context
[10:43:17.307191] coredump: Device 8/8 has finished state collection
[10:43:17.307646] coredump: Calculating ELF file layout
[10:43:17.341004] coredump: ELF file layout calculated
[10:43:17.341016] coredump: Writing ELF file to core_1766112187_9180be2fb883_2117.nvcudmp
[10:43:17.341030] coredump: Current working directory is /mnt/root/workspace/edgeep
[10:43:17.341072] coredump: Writing out global memory (16805299616 bytes)
[10:43:17.526974] coredump: SM 8/170 has finished state collection
[10:43:17.527011] coredump: SM 9/170 has finished state collection
[10:43:17.686841] coredump: SM 10/170 has finished state collection
[10:43:17.686924] coredump: SM 11/170 has finished state collection
[10:43:17.764728] coredump: SM 10/170 has finished state collection
[10:43:17.764745] coredump: SM 11/170 has finished state collection
[10:43:17.829748] coredump: SM 10/170 has finished state collection
[10:43:17.829785] coredump: SM 11/170 has finished state collection
[10:43:17.921470] coredump: SM 10/170 has finished state collection
[10:43:17.921502] coredump: SM 11/170 has finished state collection
[10:43:18.091996] coredump: SM 10/170 has finished state collection
[10:43:18.092026] coredump: SM 11/170 has finished state collection
[10:43:18.125158] coredump: SM 12/170 has finished state collection
[10:43:18.125241] coredump: SM 13/170 has finished state collection
[10:43:18.285244] coredump: SM 12/170 has finished state collection
[10:43:18.285287] coredump: SM 13/170 has finished state collection
[10:43:18.330517] coredump: SM 12/170 has finished state collection
[10:43:18.330548] coredump: SM 13/170 has finished state collection