Do "prefetch" PTX instructions (CCTL) inherently include memory barriers?

YSAY · August 13, 2024, 11:09am

When I employ the CCTL instruction(prefetch in PTX)before an LDG instruction, it should significantly boost my program’s performance if the CCTL is capable of executing concurrently with the LDG. However, the improvement I’ve encountered does not align with my expectations. This phenomenon can be rationalized if the LDG instruction is compelled to await the completion of the CCTL. Is there anyone aware if the CCTL instruction indeed halts subsequent DRAM load operations?

Topic		Replies	Views
Does the prefetch instruction delay the loading of the ld instruction? CUDA Programming and Performance	4	327	August 9, 2024
Tuning a kernel with LDG(ON/OFF,array) and prefetching CUDA Programming and Performance	12	19684	June 3, 2020
Some issues regarding the use of prefetch in the cuda kernel CUDA Programming and Performance cuda , kernel	19	497	June 11, 2025
blocking behavior of LD/ST from/to global memory CUDA Programming and Performance	3	1550	April 29, 2011
How do I understand data prefetching with a double buffer? CUDA Programming and Performance cuda	1	671	March 21, 2024
global memory prefetch is there any way ? CUDA Programming and Performance	8	6454	March 26, 2009
preventing ptxas from reordering instructions CUDA Programming and Performance	23	6599	December 2, 2022
How to use PTX prefetch.global with ASM? compiles but do not see prefetch instruction with cuobjdump CUDA Programming and Performance	7	5482	May 7, 2012
Instruction cache and instruction fetch stalls CUDA Programming and Performance	1	2092	June 26, 2019
Is manual prefetching useless? CUDA Programming and Performance	7	3992	August 19, 2011

Do "prefetch" PTX instructions (CCTL) inherently include memory barriers?

Related topics