how to profile device function called in the kernel


I am a newbie in nsight and i want to profile not only the kernel but also the device function to see which one is the most time-consuming. Is it possible to do that with nsight and how?
Thank you in advance