I have questions about the profiling mechanism of multi-application.
For multi-application profiling, I use the following commands:
nvprof ./taskA -csv -kernels ::kernelA: -metrics achieved_occupancy... & nvprof ./taskB -csv -kernels ::kernelB: -metrics achieved_occupancy... &
I used the
nvidia-smi command and found that the two applications are running at the same time.
Q1: In this case, do different application use different contexts?
When I checked the results of nvprof, I found that the achieved_occupancy of taskA was 0.719, and taskB was 0.895.
Q2: Why the sum of achieved_occupancy results of taskA and taskB is greater than 1 (0.719+0.895>1)? According to Profiler :: CUDA Toolkit Documentation, it seems that the events and metrics are associated with application. But in the description of achieved_occupancy (Profiler :: CUDA Toolkit Documentation), it is a metric for measuring the overall utilization of SM and should not exceed than 1.0 in theory.
Q3: When profiling, whether the kernel will be executed serially, if so, is it only for a single application or all applications running at the same time as discussed in NVPROF & NV_NSIGHT are much slower than adding CUPTI to the code - #2 by mjain
In addition to the above profiling method, I also use the
--profile-all-processes mode, and the total achieved_occupancy of multiple tasks running at the same time will still be greater than 1.0.
Looking forward to your reply! Thank you!