Application Range Replay frozen

I am trying to use the app-range replay option as there are certain API calls that the range option errors out on.
However, the profiler seems to hang after a couple of ranges and stays there.


I’ve added a picture of the ncu options that I am using; I’ve confirmed that kernel and application replay works with these options, so why would app-range fail?


Hi, @gohilshiv1
Sorry for the issue you met. But we can’t figure out the reason unless we have a repro.
Can you provide the repro ?


I am using a forked version of the Megatron repo, linked here: Megatron-LM/megatron/ at testProfiles · sgohil3/Megatron-LM · GitHub

I am profiling the nvtx range starting at line 791 to line 900 for


This is the profiling script that I am using if that helps: Megatron-LM/ at testProfiles · sgohil3/Megatron-LM · GitHub

Thanks. We’ll try to reproduce internally.

Hi, @gohilshiv1

app-range replay requires deterministic program execution.

Wouldn’t that throw an error when it tries to run the second pass? This is freezing in the middle of the first pass