I am use cuda-gdb on cuda-12.5 with driver 535. I am building cuda-gdb from source.
I am using cuda-gdb to attaching on pytorch training process, my nccl is build with -O3 -lineinfo.
On previous version of cuda-gdb(12.1), the inlined information are working well, but cuda-gdb on cuda12.5 missing inlined info, the symbol of inlined function are truncated
#0 ncclDevFunc_AllGather_RING_SIMPLE ()
at nccl/src/device/./prims_simple.h:239 in 2EaS1_Li2ELi2EE9ScattererILb1EEclILi1ELi0ELi1ELi0ELi7EEEviiiiiPPviS9_PiEUliE_ZS7_ILi1ELi0ELi1ELi0ELi7EEviiiiiS9_iS9_SA_EUliE0_EviRimPmbiRKT11_iRKT12_RT10_SM_ inlined from prims_simple.h:757
#1 0x00007fd7d5d826a0 in ncclDevKernel_AllGather_RING_LL<<<(16,1,1),(544,1,1)>>> () at nccl/src/device/./common.h:198 in 0ELi1ELi0ELi1ELi0ELin1EEEvllib inlined from all_gather.cu:3
How can I get full symbol name of inlined function?
Can you share the output of “nvidia-smi”? This will tell us some more information about your system and the GPUs installed.
When you said it worked with cuda-gdb from r12.1 but failed with r12.5, did you rebuild with the r12.5 compiler, or were you using the same binaries as you did with r12.1? This will help us determine if this is a compiler issue, or if it’s a debugger one.