Question relating to computation-limited application

Hi, sorry for flooding the forum with questions these days…

I think I understand that… if the application is bandwidth-limited, then performance will suffer due to stalling of the SM, just waiting for operands to arrive from DRAM.

However, I think I am not that clear… on computation-limited…
no real application can fully utilize GPU’s over 300 GFlops truly… and is there any problem, or issues with computation-limited?

If there is, how can you determine computationally-limited benchmark ?

Thanks…

There are two easy ways that I can think of. Either remove your actual computation and see if your performance improves and by what amount, or downclock either your memory or shader clocks.