The PTX documentation describes the prefetch instruction as “The prefetch instruction brings the cache line containing the specified address in global or local memory address space into the specified cache level.” (PTX ISA 8.5)
For the NVIDIA A10 GPU, when I use the prefetch.L2
instruction, how large is the cache line that can be brought into the L2 cache?