The profiling metrics is quite weird on Xavier

We profile our kernel function on two machine.The src code is the same but some metrics differs a lot i.e.“shared_load_transactions”.
The profile results differ as below

Invocations                               Metric Name                             Metric Description         Min         Max         Avg
Device "GeForce 940MX (0)"
    Kernel: match_kernel
          5      shared_load_transactions_per_request    Shared Memory Load Transactions Per Request    1.000000    1.000000    1.000000
          5     shared_store_transactions_per_request   Shared Memory Store Transactions Per Request    0.000000    0.000000    0.000000
          5                         shared_efficiency                       Shared Memory Efficiency     100.00%     100.00%     100.00%
          5                 shared_store_transactions                      Shared Store Transactions           0           0           0
          5                 <b> shared_load_transactions                       Shared Load Transactions     1990656     1990656     1990656</b>
          5                                       ipc                                   Executed IPC    1.899034    1.903520    1.901302
          5                        achieved_occupancy                             Achieved Occupancy    0.093046    0.093062    0.093053
          5                                issued_ipc                                     Issued IPC    1.899049    1.903535    1.901318
Invocations                               Metric Name                             Metric Description         Min         Max         Avg
Device "Xavier (0)"
    Kernel: match_kernel
          6      shared_load_transactions_per_request    Shared Memory Load Transactions Per Request    1.003864    1.004223    1.004103
          6     shared_store_transactions_per_request   Shared Memory Store Transactions Per Request    0.000000    0.000000    0.000000
          6                         shared_efficiency                       Shared Memory Efficiency      99.58%      99.62%      99.59%
          6                 shared_store_transactions                      Shared Store Transactions           0           0           0
          6                 <b> shared_load_transactions                       Shared Load Transactions     6661157     6663541     6662742</b>
          6                                       ipc                                   Executed IPC    2.713783    2.733202    2.724108
          6                        achieved_occupancy                             Achieved Occupancy    0.124009    0.124711    0.124416
          6                                issued_ipc                                     Issued IPC    2.713786    2.733206    2.724112

What’s wrong?

Hi,

Is this a duplicate issue of topic 1055462?
https://devtalk.nvidia.com/default/topic/1055462/jetson-agx-xavier/performance-goes-down-when-our-kernel-function-runs-on-xavier-compared-to-geforce-940mx/

If yes, would you mind to maximize the performance first?
Thanks.

Yes,it’s one of two problems we found in the same kernel practice.

The shared_load_transactions metric is the same when we replace __vsadu4 with __vsub4,
and the reversal of https://devtalk.nvidia.com/default/topic/1055462/jetson-agx-xavier/performance-goes-down-when-our-kernel-function-runs-on-xavier-compared-to-geforce-940mx/ remains.

So, we conclude that the shared_load_transactions metric seemingly does not affect the performance.

We create this topic to report that the we are confused with Xavier’s profiler and it may have bugs.

I agree that we should first settle the simd issue.

Thanks

YES.

Let’s focus on this simd issue first:
https://devtalk.nvidia.com/default/topic/1055462/jetson-agx-xavier/performance-goes-down-when-our-kernel-function-runs-on-xavier-compared-to-geforce-940mx/

Thanks.