I profile the time of a conv3d, but I got different time from ncu and pytorch profile.
- I profile with pytorch like this:
with profile(activities=[ProfilerActivity.CUDA, ProfilerActivity.CPU], record_shapes = True, profile_memory = True, with_modules=True, ) as prof:
output = model(inputs)
The profile use CUPTI to get kernel time
I got 13000 ms of the kernel.
2. I run the same python script with ncu. I got 16000 ms of the kernel.
pytorch/CUPTI got less time ? there is 23% error. which one is more accurate?