Why warm-up?

archernzy · March 17, 2017, 7:54am

In some samples such like 3_imaging_bilateralFilter,I see the program warm up(run the function bilateralFilterRGBG) before run 150 cycle…

What does this mean?

cbuchner1 · March 17, 2017, 8:08am

why do people clear their throat before giving a speech?

Same reason essentially.

archernzy · March 17, 2017, 8:14am

Is this a reasonable answer？I haven’t seen this before？

cbuchner1 · March 17, 2017, 9:52am

The first kernel call takes longer, you don’t want to measure this in your benchmark run. It may involve a Just-in-time compilation which can take many seconds for complex kernels.

This is why you generally call each kernel once before benchmarking. Sometimes it’s also referred to as “cooking” a kernel.

Similarly, when I give a speech I don’t want to start with a broken voice, so I clear my throat before commencing.

Maybe a car analogy works better.

If you want to race someone at a stoplight, you’ll put your car into gear first and rev the engine before the light turns green. This will gives you a much faster start.

njuffa · March 17, 2017, 12:43pm

Note that “warmup” prior to a benchmark is not specific to GPUs. In general, when doing performance comparisons via benchmarking, one is interested in steady-state performance, not including one-time startup costs (special studies are sometimes conducted to address just those, where important). For example, on CPUs, there is initial overhead from “cold” caches and TLBs causing many costly misses for memory access.

One variation of the warmup scheme is the strategy used by the famous STREAM memory bandwidth test: It simply reports the best performance observed in ten consecutive runs. In addition to mitigating startup issues, this scheme also virtually eliminates the sizeable run-to-run variability often seen in “long distance” data transfers, e.g. to main system memory or across PCIe.

Robert_Crovella · March 17, 2017, 2:02pm

Perhaps you haven’t been looking at well written benchmark code.

NASA (US Space Agency) wrote NPB - NASA Parallel Benchmarks - back in the 1990’s as a general set of compute benchmarks to evaluate parallel architectures.

Those benchmarks (e.g. the CG benchmark) include an untimed “warm-up” run before they begin timing. Google for that and study those codes - they are publicly available.

This is not unique to CUDA/GPUs/NVIDIA and has been common practice in Computer Science for a long time.

archernzy · March 20, 2017, 1:21am

Oh,I understand.Thanks you very much for answering.