in the article;
PGI Accelerator Programming Model on NVIDIA GPUs
Part 2 Performance Tuning
The accelerator information output begins at the subheading:
and talks about the GPU region which it breaks down into
data and kernel time.
This is logical since what else could there be for the GPU except data transfer and kernel calculation times.
The times always add up to the region time.
Now again this is logical.
However, in many examples that follow this does not always hold. The kernel and the data transfer time do not add up to the region time.
I am unsure as to why this happens. It seems that that data and kernel time should always add up to region time.
Why is this?
Why is this?
I would need a specific example but there is also initialization time to factor in. We did have an issue in early releases where the total time would be incorrect when a data region was used. Also a there was a problem when using CUDA 4.0 just after it first came. But both of these issue have been resolved.
Accelerator Kernel Timing data
32: region entered 1 times
time(us): total=1182682 init=1180869 region=1813
w/o init: total=1813 max=1813 min=1813 avg=1813
34: kernel launched 1 times
time(us): total=170 max=170 min=170 avg=170
here is what I am talking about. As you can see in this example the kernel + data time = region time. This holds until you get to the later examples in many of the whitepapers. Then it no longer holds.
Initialization time has nothing to do with it. Why does this hold sometimes and not at other times.