Hi, I use the profiler to count the l1_global_load_hit, however, if I feed my code with a large dataset, the count returns by the profiler is -1. It works for small dataset though. Can anyone tell me what might be the issue? Is it possible there are too many l1_global_load_hit and cause the count overflow?
l1_global_load_hit can be incremented by 1 per cycle. The physical counter is 32-bits. If the kernel execution time in in 10s-100s of seconds it is possible that the counter will overflow in which case the tools report -1.