Why warm-up?

In some samples such like 3_imaging_bilateralFilter,I see the program warm up(run the function bilateralFilterRGBG) before run 150 cycle…

What does this mean?

why do people clear their throat before giving a speech?

Same reason essentially.

Is this a reasonable answer?I haven’t seen this before?

The first kernel call takes longer, you don’t want to measure this in your benchmark run. It may involve a Just-in-time compilation which can take many seconds for complex kernels.

This is why you generally call each kernel once before benchmarking. Sometimes it’s also referred to as “cooking” a kernel.

Similarly, when I give a speech I don’t want to start with a broken voice, so I clear my throat before commencing.

Maybe a car analogy works better.

If you want to race someone at a stoplight, you’ll put your car into gear first and rev the engine before the light turns green. This will gives you a much faster start.

Note that “warmup” prior to a benchmark is not specific to GPUs. In general, when doing performance comparisons via benchmarking, one is interested in steady-state performance, not including one-time startup costs (special studies are sometimes conducted to address just those, where important). For example, on CPUs, there is initial overhead from “cold” caches and TLBs causing many costly misses for memory access.

One variation of the warmup scheme is the strategy used by the famous STREAM memory bandwidth test: It simply reports the best performance observed in ten consecutive runs. In addition to mitigating startup issues, this scheme also virtually eliminates the sizeable run-to-run variability often seen in “long distance” data transfers, e.g. to main system memory or across PCIe.

Perhaps you haven’t been looking at well written benchmark code.

NASA (US Space Agency) wrote NPB - NASA Parallel Benchmarks - back in the 1990’s as a general set of compute benchmarks to evaluate parallel architectures.

Those benchmarks (e.g. the CG benchmark) include an untimed “warm-up” run before they begin timing. Google for that and study those codes - they are publicly available.

This is not unique to CUDA/GPUs/NVIDIA and has been common practice in Computer Science for a long time.

Oh,I understand.Thanks you very much for answering.