Is manual prefetching useless?

i_d · August 18, 2011, 10:57am

In the forum I found a quote from MisterAnderson42 stating that:

Does this mean, there is no point in doing prefetching by hand[1]?

[1] Prefetching example from the same post:

__gloabal__ kernel (unsigned *g_mem) {

x = g_mem[thid]; // "prefetch" x

.. do a lot of arithmetics that do not depend on 'x'..

** x is used here **

}

tera · August 18, 2011, 1:08pm

Note that the compiler automatically reorders load instructions as far ahead of the computation as possible anyway. So there will be no difference unless there is a dependency that the compiler cannot resolve automatically.

i_d · August 18, 2011, 1:34pm

Hi Tera,

thanks for your answer. One more thing. How does this fit together with what Kirk and Hwu write in their book about prefetching?

See here CUDA Textbook - IAP 2009 CUDA@MIT / 6.963 chapter 5 page 14f on prefetching

tera · August 18, 2011, 5:03pm

The [font=“Courier New”]__synctreads()[/font] instruction in their example acts as a memory barrier, thus qualifying as a dependency the compiler may not resolve automatically.

SPWorley · August 18, 2011, 5:05pm

While the compiler will indeed reorder things as best it can, you can still change your code to give the compiler more opportunities.
This usually means some manual unrolling of loops. “Prefetching” is main idea of the general theme of increasing ILP (Instruction Level Parallelism).

The results can be quite measurable for some tight compute patterns.

Here’s Vasily’s excellent (really excellent!) analysis and presentation from GTC 2010.

DrAnderson42 · August 18, 2011, 7:06pm

Indeed. One shouldn’t take quotes out of context. In the original quote, I must have been referring to some specific code or situation that was being described. Prefetching by hand can, but not necessarily always will, improve performance. I use it in a few select kernels in my code - most kernels it doesn’t make a difference at all.

i_d · August 19, 2011, 9:22am

Thanks, that was the answer I was looking for. Cheers.

i_d · August 19, 2011, 9:23am

Thanks for the pdf. It looks really quite extensive.

Topic		Replies	Views
Does the prefetch instruction delay the loading of the ld instruction? CUDA Programming and Performance	4	327	August 9, 2024
Software prefetch at kernel level CUDA Programming and Performance	0	434	December 20, 2020
global memory prefetch is there any way ? CUDA Programming and Performance	8	6454	March 26, 2009
Some issues regarding the use of prefetch in the cuda kernel CUDA Programming and Performance cuda , kernel	19	497	June 11, 2025
Boosting Application Performance with GPU Memory Prefetching Technical Blog	7	1409	March 9, 2023
ask for code example of prefetching CUDA Programming and Performance	4	3241	November 14, 2008
Unified memory prefetching messes up parts of explicitly managed memory. CUDA Programming and Performance	0	438	February 27, 2020
cudaMemPrefetchAsync why is it Device to Host? Profiling Linux Targets cuda	1	964	May 1, 2023
Tuning a kernel with LDG(ON/OFF,array) and prefetching CUDA Programming and Performance	12	19684	June 3, 2020
Do "prefetch" PTX instructions (CCTL) inherently include memory barriers? CUDA Programming and Performance cuda , llm	0	96	August 13, 2024

Is manual prefetching useless?

Related topics