Custom profiler counters

Section B.12 describes the custom profiler counters exposed to the users with __prof_trigger( int index).
Is there a way to programatically read this value after a kernel launch or only via the visual profiler?

Also, it is mentioned that the value is “increments by one per warp the per-multiprocessor hardware…”, so there
is no thread granuality for this operation? I’d like to count all texture fetches for example, and depending on some condition
in the code, some threads in a wrap might access the textures and some might not. Is it not possible to make such a distinction?