Hello,
I profiled two kernels run on the same inputs and the same graphics card, but I received the results attached in the screenshot. The two quantities I am most concerned with are the Memory Throughput [%] with respect to SOL, and the Memory Throughput [Gbyte/second]. In the first attachment, I received 1.46% of the SOL while I have 2.66 Gbyte/second, and in the second attachment I have 0.61% of the SOL while I have 5.75 Gbyte/second. My question is, shouldn’t the program with the higher % of SOL have a higher throughput in Gbyte/second?
My first thought was that it was because of the large number of passes that the profiler does on my application, with each result possibly being inconsistent, so I tried profiling the same program with the same inputs yet again but still found this same problem.
Any thoughts on what I might be getting wrong or misunderstanding?
Thanks!