I fllowed https://www2.cs.sfu.ca/~kabanets/405/ and wrote a code of matrix multipication using shared memory. When I tried to profile this program usng nvproof, I met the following error:
nvprof --metrics shared_load_transactions_per_request,shared_store_transactions_per_request ./matrixMulShared
==8001== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics. Replaying kernel "matrixMulGlobal(Matrix, Matrix, Matrix)" (2 of 2)... Replaying kernel "matrixMulGlobal(Matrix, Matrix, Matrix)" (done) ==8001== Error: Internal profiling error 4168:7. matrix multiplication on CPU: 18.087000 ms ======== Error: CUDA profiling error.
When runing nvprof without “metrics” arg, there is no error.
I also tried with other metrics such as gld_throughput, gld_efficiency, and the same error occurs.