As far as I can see, there’s no way to profile or even timeline OpenCL applications in NSight, correct? Aside from using event callbacks to get the raw start-stop times of kernels, what other options do I have for profiling / time-lining kernels in an OpenCL application?
(Note: OpenCL used because GPU code needs to be cross-vendor)