Problem with nvprof

i am trying to measure the gld_throughput and gst_throughput of my kernel with nvprof. My kenrel is a multiplication of 2 arrays with various dimensions. It was all great when suddenly i measured the throughput of the multiplication of huge arrays (8192x8192) and the result was “OVERFLOW”. After this, when i measure the throughput of the muliplication of smaller arrays which i had already measured, the throughput is much smaller than the throughput i had. But the execution time is always the same. So i suppose the small throughput is not correct because if it was, my execution time would be much bigger.
Any ideas?
Thank you in advance!