Hi,
I am using Nsight Compute and System to profile BERT in PyTorch. I want to know whether we need to package the interesting code using “torch.cuda.synchronize()” as follows. Is it ok to use only range_push/pop when I use Nisght Compute and System?
torch.cuda.synchronize()
torch.cuda.nvtx.range_push(“test”)
with torch.cuda.profiler.profile():
with torch.autograd.profiler.emit_nvtx():
interesting code
torch.cuda.synchronize()
torch.cuda.nvtx.range_pop()
As I know, if I want to measure time, it needs to be packaged with synchronization. (link)
It would be helpful if you could answer me.
Thank you!