Problem with nvprof

Hello,
i am trying to measure the gld_throughput and gst_throughput of my kernel with nvprof. My kenrel is a multiplication of 2 arrays with various dimensions. It was all great when suddenly i measured the throughput of the multiplication of huge arrays (8192x8192) and the result was “OVERFLOW”. After this, when i measure the throughput of the muliplication of smaller arrays which i had already measured, the throughput is much smaller than the throughput i had. But the execution time is always the same. So i suppose the small throughput is not correct because if it was, my execution time would be much bigger.
Any ideas?
Thank you in advance!

Can you please provide a minimal reproducible or supply the following information: GPU, throughput prior to overflow, and throughput after overflow.

Most of the hardware performance counters are 32-bit. Lauching a very large kernel can result in overflow which marks the counter as overflowed and requires a reset of the counter.

Yes, of course.
GPU: NVIDIA Tegra X1
gld_throughput prior to overflow: Min:5e+07 GB/s Max:5e+07 GB/s Avg:5e+07 GB/s
gld_throughput after overflow: Min:43.879 KB/s Max:43.879 KB/s Avg:0.00000 B/s

The overflow happened when i launched a kernel which execution time is 5332 seconds.

How can i reset the counter?