How to access (or compute) block durations and warp durations from raw data?

MingYang4 · March 11, 2019, 9:04pm

Hi,

I noticed that there are histograms of block and warp durations under the Details View, Launch Statistics in Nsight Compute. I’m hoping to access the raw data of these durations for my research. But I didn’t find it in the Raw View. Is there a way to get such data or to compute these results?

Thanks!

Ming

MingYang4 · March 30, 2019, 11:02pm

Can anyone from NVIDIA please answer this question? Thanks!

felix_dt · April 3, 2019, 11:32am

Hi Ming,

to answer your question in short, there is currently no way to retrieve this data other than parsing the report file yourself. There is some information on how to do this here https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#report-file-format. Using the python rule system to read this data would be an alternative, easier approach, but we found that there is a bug with respect to these metrics, due to which that is currently not possible.

We will look into fixing this problem in a future release of Nsight Compute.

As background info:

The metrics you are looking for are called sass__block_histogram and sass__warp_histogram. This can be seen from the file LaunchStatistics.section within the sections directory of the Nsight Compute installation. .section files define what is collected for each kernel launch, and how the data is shown in the report. You can refer to https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#profiler-report-details-page for more details on this.

Those metrics are called “instanced metrics”, since they contain values for multiple instances of the represented domain (in this case warp/block runtime bins). Since they also happen to contain a non-instanced value in this case (the sum of all per-bin counts), the instanced values are not shown on the Raw page.

MingYang4 · April 3, 2019, 2:05pm

Thanks for your detailed answer. That’s very helpful! To clarify, is it possible to get the “instanced values” of per-bin counts by parsing the report file at this moment? Or is it because of the bug you referred this currently cannot be done?

I found that sass__block_histogram and sass__warp_histogram are not available on GV100. This is actually also documented in the known issues section of Nsight Compute document. Will these metrics be provided in the future?

I really look forward into Nsight catching up with nvprof, which reports instanced metrics nicely. Unfortunately, I don’t think nvprof provides block/warp histogram.

Thanks,
Ming

MingYang4 · April 3, 2019, 8:38pm

I took your suggestion and had an attempt on reading the data through the rules system. It would be great if the following issue can be fixed. It’s probably the bug you referred? I really need these metrics for my research. Please let me know if I can help test or anything.

The instanced values all seem were overwritten to be zero. I added the following code to the LaunchStatistics.py, apply() function.

from NvRules import metric_instances
block_hist = action.metric_by_name("sass__block_histogram")               
print(str(block_hist.num_instances())+str(metric_instances(block_hist)))

The output is 47[0.0, 0.0, …, 0.0] for a given profile, while the histogram displays some non-zero data. The number of bins is equal to the num_instances. But the content is wiped to zero.

Thanks!

felix_dt · April 4, 2019, 12:30pm

Yes, that is the bug I was referring to, and it will be fixed in a future release. We are also looking into providing more details on how to parse the report file itself, e.g. by means of sample code.

MingYang4 · April 12, 2019, 1:04am

Hi Felix,

I managed to parse the report file and got block/warp histogram. But Nsight compute profiling seems not working with MPS for Pascal? It works for a Volta.

Also, will sass__block(warp)_histogram be supported on NVIDIA in a future release?

Thanks!

felix_dt · April 15, 2019, 6:55am

I assume your question is if those metrics will be supported on GV100 and newer architectures in a future release? The answer to that is yes, we are planning on supporting those.

What exact Pascal GPU are you using?

MingYang4 · April 16, 2019, 2:11am

Oh right, that is a typo. I meant GV100. Thanks for answering that!

The Pascal GPU I used is 1070. I read that GP10x is supported. By the way, nsight compute profiles fine without MPS. It’s when MPS is turned on that nsight compute reports that profiling is not supported.

MingYang4 · April 22, 2019, 9:44pm

It’s GTX 1070. Any clue why Nsight Compute doesn’t work with MPS turned on for this GPU?

felix_dt · April 23, 2019, 8:04am

I do not currently know why exactly it wouldn’t work with MPS, but we will check this internally and update here once I have more information.

MingYang4 · April 23, 2019, 8:07pm

Thanks for looking at this and please let me know if any information is needed to reproduce this issue.

Meanwhile, a quick question that does Nsight Compute profiler work with concurrent kernel execution? I tried it with two kernels that are supposed to be able to execute concurrently, but it seems that Nsight only profiles one kernel at a time. In another word, concurrency is disabled during profiling, is it true?

If so, is there any other way to get block durations (except reading global clock) for concurrent kernel execution?

Thanks!

Ming

rbischof · April 23, 2019, 9:54pm

Yes, current version of Nsight Compute serializes the kernels and profiles them one after the other.

Sanjiv.Satoor · May 11, 2020, 10:21am

Hi Ming,

@MingYang4

Note that the issue of reading the data using the python rule system has been fixed in Nsight Compute version 2019.5 (which is part of the CUDA Toolkit 10.2).

MingYang4 · May 11, 2020, 1:02pm

That’s great! Thanks for the reply. Does Nsight Copmuter work with concurrent kernel execution and MPS now?

Topic		Replies	Views
Using Nsight Compute to Inspect your Kernels Technical Blog	2	1725	August 31, 2020
How do i get some of the nvprof metrics in insight? Nsight Compute	0	745	June 2, 2021
Nsight Compute metrics value confused Nsight Compute performance-metrics	1	1113	December 14, 2021
I want to get csv file about the result of profilling Profiling Linux Targets nsight	1	1330	August 27, 2021
Nv-nsight-cu-cli --metrics gpu__time_active ./program show n/a data Nsight Compute cuda	2	897	October 12, 2021
Can't Get NCU GUI To Import Properly Nsight Compute	8	1391	October 5, 2020
Question about profiling nccl kernels with Nsight Compute Nsight Compute	20	5340	February 13, 2025
How to get all kernels name? Nsight Compute	5	1767	July 24, 2020
Option to profile only master process Nsight Compute cuda	23	3671	December 1, 2023
Nsight and nvprof results have large differences Nsight Compute	9	1215	November 26, 2019

How to access (or compute) block durations and warp durations from raw data?

Related topics