Cupti activity tracer hangs at cuptiActivityFlushAll when tracing pytorch models

I have a cupti tracer using activity api. It works well on cuda/tensorflow however it hangs when i try to run a pytorch model. It hangs when trying to call cuptiActivityFlushAll.

Here is where it seems to hang

* frame #0: 0x00007fff052b96e6`do_futex_wait at futex-internal.h:205
    frame #1: 0x00007fff052b96be`do_futex_wait(sem=0x000000000addd940, abstime=0x0000000000000000) at sem_waitcommon.c:111
    frame #2: 0x00007fff052b97d8`__new_sem_wait_slow(sem=0x000000000addd940, abstime=0x0000000000000000) at sem_waitcommon.c:181
    frame #3: 0x00007fff01afd88f`___lldb_unnamed_symbol3808$$ + 239
    frame #4: 0x00007fff0197632e`cuptiActivityFlushAll + 526

Using a debugger i can see another thread waiting here

frame #0: 0x00007fff052b69f3`__pthread_cond_wait at futex-internal.h:88
    frame #1: 0x00007fff052b69d8`__pthread_cond_wait at pthread_cond_wait.c:502
    frame #2: 0x00007fff052b68f8`__pthread_cond_wait(cond=0x00000000170b0850, mutex=0x00000000170b0828) at pthread_cond_wait.c:655
    frame #3: 0x00007fff008afbdd`___lldb_unnamed_symbol3770$ + 253
    frame #4: 0x00007fff0084e653`___lldb_unnamed_symbol2219$ + 147
    frame #5: 0x00007fff008aee18`___lldb_unnamed_symbol3727$ + 40
    frame #6: 0x00007fff052b06db`start_thread(arg=0x00007ffef4b48700) at pthread_create.c:463
    frame #7: 0x00007fff0482fb2f`__GI___clone at clone.S:95

Is there anything that might be causing this? I’d love some pointers to help me debug this.The tracer and the gpu workload run in the same process.

Thank you,

Hi Sujan,

What CUDA toolkit and GPU you are using? If you are using an older toolkit, please check if the issue reproduces on the recent toolkits CUDA 11.0 or 10.2. One more experiment which can be tried out is use the force flush by passing 1 as the flag to the API cuptiActivityFlushAll.

It’s difficult to identify the issue from the callstack. Would it be possible for you to give a minimal reproducer?

Turns out I had a bug that freed the CUPTI buffer before CUPTI handed out the buffer through buffer_returned callback.
That caused cuptiActivityFlushAll to hang.
Fixing the buf fixed the issue.