I tried callback_metric sample of CUPTI 10 on Ubuntu 18.04 and GeForce MX150, then I verified its results by nvprof of CUDA 10. sometimes they approve each other, but sometimes CUPTI returns extra agglomerated values.
for vector add with size = 1 << 20 CUPTI results are:
CUDA Device Number: 0
CUDA Device Name: GeForce MX150
Launching kernel: blocks 4096, thread/block 256
Pass 0
Launching kernel: blocks 4096, thread/block 256
l2_subp0_total_read_sector_queries = 131096 (65544, 65552)
l2_subp0_total_read_sector_queries (normalized) (131096 * 2) / 2 = 131096
l2_subp1_total_read_sector_queries = 131088 (65544, 65544)
l2_subp1_total_read_sector_queries (normalized) (131088 * 2) / 2 = 131088
Metric l2_read_transactions = 262184
and sometimes are:
CUDA Device Number: 0
CUDA Device Name: GeForce MX150
Launching kernel: blocks 4096, thread/block 256
Pass 0
Launching kernel: blocks 4096, thread/block 256
l2_subp0_total_read_sector_queries = 131101 (65540, 65561)
l2_subp0_total_read_sector_queries (normalized) (131101 * 2) / 2 = 131101
l2_subp1_total_read_sector_queries = 262200 (131080, 131120)
l2_subp1_total_read_sector_queries (normalized) (262200 * 2) / 2 = 262200
Metric l2_read_transactions = 393301
but nvprof always returns:
==25701== Event result:
Invocations Event Name Min Max Avg Total
Device “GeForce MX150 (0)”
Kernel: kernel(int const *, int const , int, int)
1 l2_subp0_total_read_sector_queries 131108 131108 131108 131108
1 l2_subp1_total_read_sector_queries 131093 131093 131093 131093
==25701== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device “GeForce MX150 (0)”
Kernel: kernel(int const *, int const , int, int)
1 l2_read_transactions L2 Read Transactions 262201 262201 262201
what’s wrong with CUPTI?