Torch profiling failing for an ML workload

skps23 · January 21, 2025, 10:36pm

I am profiling a ML workload using torch profiler. The code appears as:

    with profile(activities=[
                    ProfilerActivity.CPU,
                    ProfilerActivity.CUDA],
                record_shapes=True
            ) as prof:
        main_args = parse_main_args()
        main(main_args, DETECTED_SYSTEM)
    prof.export_chrome_trace("torch_trace.json")
    # print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=20))
    # print(prof.key_averages().table(sort_by="self_cuda_time_total", row_limit=20))

The code runs fine without the profiler. The code also runs fine to finish with the torch profiler. However when the profiler reaches the export statement, I get the following error:

[mlperf-inference-skps-x86-64-29200:6413 :0:6413] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x55ea1b76a8cc)
==== backtrace (tid:   6413) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000006743c49 libkineto::CuptiCallbackApi::__callback_switchboard()  ???:0
 2 0x00000000067441ba libkineto::callback_switchboard()  CuptiCallbackApi.cpp:0
 3 0x0000000000117456 cuptiEnableAllDomains()  ???:0
 4 0x000000000010f5c4 cuptiGetRecommendedBufferSize()  ???:0
 5 0x000000000010d3a8 cuptiGetRecommendedBufferSize()  ???:0
 6 0x00000000001b295d cudbgApiInit()  ???:0
 7 0x00000000001b393b cudbgApiInit()  ???:0
 8 0x00000000001ae05c cudbgApiInit()  ???:0
 9 0x00000000002d2188 cuStreamWaitEvent()  ???:0
10 0x0000000000027ee8 __cudaRegisterUnifiedTable()  ???:0
11 0x000000000002856d __cudaRegisterUnifiedTable()  ???:0
12 0x0000000000045495 secure_getenv()  ???:0
13 0x0000000000045610 exit()  ???:0
14 0x0000000000029d97 __libc_init_first()  ???:0
15 0x0000000000029e40 __libc_start_main()  ???:0
16 0x000000000024ec65 _start()  ???:0
=================================
/bin/bash: line 1:  6413 Segmentation fault      (core dumped) LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/work/build/inference/loadgen/build python3.10 -m code.main --benchmarks=dlrm-v2 --scenarios=offline --action="run_harness" 2>&1
      6414 Done                    | tee /work/build/logs/2025.01.21-19.47.54/stdout.txt
make: *** [Makefile:46: run_harness] Error 139

Topic		Replies	Views
Error in reading profiler output CUDA Programming and Performance	16	23345	September 27, 2010
Usage of CUPTI appears to rarely cause Cuda graph conditional nodes to segfault upon instantiation CUPTI – CUDA Profiler Tools Interface debugging-and-troubleshooting	4	265	September 17, 2024
Profiling failure due to CUDNN_STATUS_INTERNAL_ERROR Nsight Compute	3	1870	October 12, 2021
PGPROF not working Legacy PGI Compilers	0	3279	August 9, 2016
CUDA Visual Profiler Warning (dropped rows) CUDA Programming and Performance	10	21615	October 7, 2011
Visual Profiler : does not contain profiler output CUDA Programming and Performance	2	1934	April 7, 2011
CUPTI_ERROR_NOT_INITIALIZED is cancelling my brain on MS WSL2 CUDA on Windows Subsystem for Linux	1	2294	October 30, 2024
Segfault when running application in profiler Visual Profiler and nvprof	2	3919	July 10, 2017
Profiling only partially works nvc, nvc++ and nvfortran	10	1351	July 21, 2020
Visual Profiler shows "Error in reading program output" CUDA Programming and Performance	4	2472	February 19, 2011

Torch profiling failing for an ML workload

Related topics