Hi. I’m trying to interpret the result of CUPTI sample, pc_sampling
.
I got this result.
1 Device Name: NVIDIA GeForce RTX 3090
2 Device compute capability: 8.6
3 id 1, ctx 1, moduleId 18, functionIndex 8, name _Z6VecAddPKiS0_Pii
4 Source Locator Id 1, File /home/user01/ksy/test/pc_sampling.cu Line 46
5 source 1, functionId 1, pc 0x100, corr 1, samples 1, latency samples 1, stallreason Memory throttle
6 Source Locator Id 2, File /home/user01/ksy/test/pc_sampling.cu Line 45
7 source 2, functionId 1, pc 0xd0, corr 1, samples 12, latency samples 12, stallreason Memory dependency
8 Source Locator Id 3, File /home/user01/ksy/test/pc_sampling.cu Line 43
9 source 3, functionId 1, pc 0x10, corr 1, samples 1, latency samples 1, stallreason Execution dependency
10 Source Locator Id 4, File /home/user01/ksy/test/pc_sampling.cu Line 42
11 source 4, functionId 1, pc 0x0, corr 1, samples 3, latency samples 3, stallreason Instruction fetch
12 source 4, functionId 1, pc 0x0, corr 1, samples 7, latency samples 7, stallreason Constant memory dependency
13 source 2, functionId 1, pc 0x70, corr 1, samples 2, latency samples 2, stallreason Constant memory dependency
14 Source Locator Id 5, File /home/user01/ksy/test/pc_sampling.cu Line 44
15 source 5, functionId 1, pc 0x40, corr 1, samples 1, latency samples 1, stallreason Execution dependency
16 source 5, functionId 1, pc 0x40, corr 1, samples 13, latency samples 13, stallreason Constant memory dependency
17 source 5, functionId 1, pc 0x50, corr 1, samples 1, latency samples 1, stallreason Execution dependency
18 source 2, functionId 1, pc 0xa0, corr 1, samples 1, latency samples 0, stallreason Selected
19 corr 1, totalSamples 42, droppedSamples 0, samplingPeriodInCycles 32
and kernel code
42 __global__ void VecAdd(const int *A, const int *B, int *C, int N) {
43 int i = blockDim.x * blockIdx.x + threadIdx.x;
44 if (i < N)
45 C[i] = A[i] + B[i];
46 }
So it is right to interpret this result like this?:
- for the first sampling period, Memory throttle happened in line 46
- for the 2nd~13th period(12 samples), there was memory dependency in line 45
If it is right to interpret like this, then how come source locator is located at line 10?
I mean line 42 is starting line of kernel code, so doesn’t it have to be in the top of the sampling if sampled?
Thank you in advance!