I am using Nsight Compute and System to profile BERT in PyTorch. I want to know whether we need to package the interesting code using “torch.cuda.synchronize()” as follows. Is it ok to use only range_push/pop when I use Nisght Compute and System?
As I know, if I want to measure time, it needs to be packaged with synchronization. (link)
It would be helpful if you could answer me.