I am currently studying the book Professional CUDA C Programming and have learned how to preheat using the warmup kernel function.
What is the content of the warmup function? I couldn’t find the specific content of the warp function online, only an empty warp kernel function, so is the warp kernel function empty?
And I still don’t understand why we need to use the warmup kernel function for preheating? Will the GPU run better after preheating?
Thanks to all.
The code provided in the book is as follows, but there is no specific definition of the warmup function.
cudaMemcpy(d_idata, h_idata, bytes, cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
iStart = seconds();
warmup<<<grid, block>>>(d_idata, d_odata, bytes);
cudaDeviceSynchronize();
iElaps = seconds() - iStart;
cudaMemcpy(h_odata, d_odata, grid.x*sizeof(int), cudaMemcpyDeviceToHost);
gpu_sum = 0;
for(int i = 0; i<grid.x; i++){
gpu_sum += h_odata[i];
}
printf("gpu Warmup elapsed %d ms gpu_sum: %d <<<grid %d block %d>>>\n", iElaps, gpu_sum, grid.x, block.x);
cudaMemcpy(d_idata, h_idata, bytes, cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
iStart = seconds();
reduceNeighbored<<<grid, block>>>(d_idata, d_odata, bytes);
cudaDeviceSynchronize();
iElaps = seconds() - iStart;
cudaMemcpy(h_odata, d_odata, grid.x*sizeof(int), cudaMemcpyDeviceToHost);
gpu_sum = 0;
for(int i = 0; i<grid.x; i++){
gpu_sum += h_odata[i];
}
printf("gpu Neighbored elapsed %d ms gpu_sum: %d <<<grid %d block %d>>>\n", iElaps, gpu_sum, grid.x, block.x);
It is curious that the book does not mention the purpose of, and design criteria for, warmup kernels. I have not read this book, but various questions over the years that were motivated by contents of this book lead me to believe that it may not be the best learning resource.
To avoid misunderstandings: “warmup” is used in a figurative sense here, not in the literal (physical) sense.
In my experience the warmup kernel is in most cases identical with the kernel under test, which is executed a few times before measurements / profiling commence after “steady state” has been reached. In this context, “few” typically means a single-digit number. In some cases the warmup kernel may have reduced complexity compared to the kernel under test, but it seems very unusual in my experience that it would consist only of an empty kernel.
Complex hardware devices like GPUs contain numerous administrative / control mechanisms, buffers, and caches whose timing behavior differs based on their current state. When a new kernel is launched, these mechanisms are “out of tune” with the new workload, leading to increased execution times. After a few invocations of the kernel, these mechanism have adapted to the new workload, at which point we get a more realistic picture of kernel execution characteristics. The same applies to the host-side components of the CUDA software stack, in particular the driver and the runtime library, and how they utilize CPU resources.
Note: Modern CPUs and GPU use dynamic clocking for their circuits, meaning that the operating frequency depends on characteristics of the workload as well as environmental factors, one of which is temperature. This means that performance can vary as the GPU heats up with prolonged run time. If one needs to consider this factor in performance assessments, one must typically run a GPU for several minutes to achieve “thermal steady state”. “Several” typically means 5 minutes or thereabouts.
Thank you very much for your detailed answer. I understand the purpose of warmup. Do you know what the warmup function is? Can you recommend some resources for learning CUDA?
I do not know what the warmup function is in the book. I have not read the book, and I do not own a copy of it.
I already explained that in common industry practice, one does not provide a separate, specialized warmup kernel, but instead uses the kernel under test for warmup purposes, because this applies the actual workload to which one would like the hardware mechanisms to adapt during the warmup phase.
Unless the book explains why there is a separate warmup kernel in use here, I would strongly suggest to ignore this particular usage in the book, as this example usage does not appear to teach anything useful or productive, but instead causes confusion.
Okay, thank you for your answer. I’ll ignore this part and continue learning. If possible, could you recommend some other materials for learning CUDA for me?
When it comes to CUDA, I am about 15 years away from being able to recommend resources for self-learners. Other forum participants may be able to provide up-to-date guidance.
Okay, thank you for your reply.
Perhaps one should state that a warmup function is not strictly necessary.
It has two important uses
- for benchmarking/profiling to get a more typical or stable number without one-time effects
- for time-critical applications, which run in a loop. You call the warmup kernel before starting the loop so that one-time effects won’t appear inside the loop.
If just your overall program should run fast (with or without loop) then a warmup kernel is not necessary (and would even slow down the overall runtime).
Thank you for your detailed answer. I have gained a deeper understanding of warm.
A good place to start is this training series -links to slides and recordings at the end of each lesson page:
Okay, thank you for recommending the materials. I will study them.