AFAIK dlprofviewer calculates the total wall clock time and other useful metrics using nsight systems nsys_profile.sqlite. Also, It will be very helpful if there is documentation on how the different metrics shown in dlprof are calculated from CUPTI tables.
Update
I got a match of the GPU utilization metric shown in dlprof and the sum of entry-start for all kernels in CUPTI_ACTIVITY_KIND_KERNEL. I got the match of the Memory utilization metric shown in dlprof and the sum of entry-start for all entries in CUPTI_ACTIVITY_KIND_MEMCPY.
But there is no entry in the tables for the CPU time. The question then is how is dlprof calculating the CPU time for the ops? Thanks.