Instruction based profiler


Is there a cuda profiler which can show %gpu usage per instruction (as opposed to visual profiler that gives results per kernel)?

Is there a chance that a home user could run it on his laptop with xp (as opposed to nsight for example, which requires a computer farm and the newest OS)?