Profiling GPU at source code level


Does anyone know of a profiling tool for CUDA that will show you the GPU execution time at the source code level, ie so you can see time spent in each function within a complex kernel?

Many thanks

On Linux: Nvidia Visual Profiler (nvvp) in combination with the -lineinfo compile switch

And on Windows?

Nsight Visual Studio Edition for Windows has built in profiling features.

It seems the NVidia Visual Profiler is in fact a cross platform tool and also available on Windows.