How to understand the metrics results of profiling under multi-application?

Hi, all.

I have questions about the profiling mechanism of multi-application.

For multi-application profiling, I use the following commands:

    nvprof ./taskA -csv -kernels ::kernelA: -metrics achieved_occupancy... &
    nvprof ./taskB -csv -kernels ::kernelB: -metrics achieved_occupancy... &

I used the nvidia-smi command and found that the two applications are running at the same time.

Q1: In this case, do different application use different contexts?

When I checked the results of nvprof, I found that the achieved_occupancy of taskA was 0.719, and taskB was 0.895.

Q2: Why the sum of achieved_occupancy results of taskA and taskB is greater than 1 (0.719+0.895>1)? According to Profiler :: CUDA Toolkit Documentation, it seems that the events and metrics are associated with application. But in the description of achieved_occupancy (Profiler :: CUDA Toolkit Documentation), it is a metric for measuring the overall utilization of SM and should not exceed than 1.0 in theory.

Q3: When profiling, whether the kernel will be executed serially, if so, is it only for a single application or all applications running at the same time as discussed in NVPROF & NV_NSIGHT are much slower than adding CUPTI to the code - #2 by mjain

In addition to the above profiling method, I also use the --profile-all-processes mode, and the total achieved_occupancy of multiple tasks running at the same time will still be greater than 1.0.

Looking forward to your reply! Thank you!

My environment information is shown below:
OS: ubuntu1~18.04
GPU: V100
CUDA: 10.1