What granularity can I obtain via nvidia profiler

I have two questions.

  1. How to use CUPTI library to get several event queries at one time? I know nvprof that supports this functionality.
  2. Could I use CUPTI library to obtain event metrics per thread or per warp?

Firstly, CUPTI is an API meant for profile tool creation, not for individual CUDA developers. CUPTI documentation is online at docs.nvidia.com.

All events and metrics available through CUPTI are also available through nvprof so there is no additional events available if you use CUPTI directly. Specifically there are no additional per-thread or per-warp events available if you use CUPTI directly.