My company recently provided me with an RTX 3090 for testing. I’ve used the Visual Profiler for years and have come to rely on it quite heavily. I really like the concept of Nsight Systems, but I really struggled to perform basic CUDA profiling with it. I’m wondering if there are any plans to implement or change some of these issues:
-“Show in events view” has to be hit to switch between looking at memory vs kernel operations. Unless you’re currently in Events view for an operation, you can’t click on an individual kernel to inspect its diagnostics. When you can select a kernel, it requires a double click. All of these together make navigating a profile very slow - in the Visual Profiler, it was a single click to select anything, now it’s typically 4.
-The aggregate information for kernels seems to have been removed. I rely heavily on viewing the total runtime for all kernels, the total kernel run count, and the sum of all execution time of a particular kernel.
-Separate colors for different kernels. One of the best parts of the visual profiler was the aesthetic appeal. I frequently used it to present CUDA concepts to coworkers, because the tool greatly simplified complex GPU operations. This tool is difficult to look at, even for a programmer who knows what they’re doing
-Expanding kernels with the + button one by one is very slow. Holding Ctrl to show 5 feels like a hacky workaround
-Ctrl + drag requires selecting from a context menu to zoom in, which is the only operation that really needs to be mapped to this button combination
The Visual Profiler was brilliant in its simplicity. I really like the additional information that is trying to be presented in Nsight Systems, but it’s fundamentally very difficult to do the thing it should be primarily used for: profiling Cuda code. I’m optimistic that the tool will improve in the next few years, so hopefully this feedback can contribute to that.