Time variation based on number of program executions

Hi there,

First, I measur the time t a program needs to run on the GPU once. Then, I execute the same program on the same GPU 100 times in a row and the average time I get is smaller than t. Why is this?

Thank you.

A general rule of performance benchmarking is to never measure things on the first run. This is independent of the use of GPUs. Depending on what an application is doing, two common benchmarking methodologies are (typically 3 <= N <= 10):

  1. Report the performance based on the fastest of N runs
  2. Report the performance based on the average of N runs (excluding the first)

The first run (or even the first few runs) of any software tends to be settled many different kinds of cold-start overheads, such as loading code into system memory, one-time software initialization overhead, or priming hardware structures such as caches or TLB.

Note that the execution time of GPU kernels can vary significantly for an identical workload due to dynamic clocking which is dependent on GPU temperature and power consumption, and can also differ between individual instances of the same GPU model. Under heavy load, GPU temperature tends to increase and clocks tend to decrease, for example. The latest CPU generations use similar dynamic clocking schemes that can likewise lead to variable performance.

1 Like