I noticed that there are histograms of block and warp durations under the Details View, Launch Statistics in Nsight Compute. I’m hoping to access the raw data of these durations for my research. But I didn’t find it in the Raw View. Is there a way to get such data or to compute these results?
to answer your question in short, there is currently no way to retrieve this data other than parsing the report file yourself. There is some information on how to do this here https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#report-file-format. Using the python rule system to read this data would be an alternative, easier approach, but we found that there is a bug with respect to these metrics, due to which that is currently not possible.
We will look into fixing this problem in a future release of Nsight Compute.
Those metrics are called “instanced metrics”, since they contain values for multiple instances of the represented domain (in this case warp/block runtime bins). Since they also happen to contain a non-instanced value in this case (the sum of all per-bin counts), the instanced values are not shown on the Raw page.
Thanks for your detailed answer. That’s very helpful! To clarify, is it possible to get the “instanced values” of per-bin counts by parsing the report file at this moment? Or is it because of the bug you referred this currently cannot be done?
I found that sass__block_histogram and sass__warp_histogram are not available on GV100. This is actually also documented in the known issues section of Nsight Compute document. Will these metrics be provided in the future?
I really look forward into Nsight catching up with nvprof, which reports instanced metrics nicely. Unfortunately, I don’t think nvprof provides block/warp histogram.
I took your suggestion and had an attempt on reading the data through the rules system. It would be great if the following issue can be fixed. It’s probably the bug you referred? I really need these metrics for my research. Please let me know if I can help test or anything.
The instanced values all seem were overwritten to be zero. I added the following code to the LaunchStatistics.py, apply() function.
from NvRules import metric_instances
block_hist = action.metric_by_name("sass__block_histogram")
The output is 47[0.0, 0.0, …, 0.0] for a given profile, while the histogram displays some non-zero data. The number of bins is equal to the num_instances. But the content is wiped to zero.
Oh right, that is a typo. I meant GV100. Thanks for answering that!
The Pascal GPU I used is 1070. I read that GP10x is supported. By the way, nsight compute profiles fine without MPS. It’s when MPS is turned on that nsight compute reports that profiling is not supported.
Thanks for looking at this and please let me know if any information is needed to reproduce this issue.
Meanwhile, a quick question that does Nsight Compute profiler work with concurrent kernel execution? I tried it with two kernels that are supposed to be able to execute concurrently, but it seems that Nsight only profiles one kernel at a time. In another word, concurrency is disabled during profiling, is it true?
If so, is there any other way to get block durations (except reading global clock) for concurrent kernel execution?