Profile a multi kernel app, and report results on a kernel by kernel basis

Hi All,
Is there a way to use ncu to profile and application with mutiple cuda kernels and have cycle and bandwidth results for each kernel ?
Thanks