Coalesced and conflict free memory access using cuda::memcpy_async/cp.async

TheRune · September 20, 2024, 11:00am

That does indeed sound very similar, but sadly I have still not managed to find a solution, and have moved on to try using Cutlass instead.

I also found this forum post from last year which seems to describe a similar issue, but with no answers.

A minimal example might be interesting, perhaps it might even be possible to find optimal patterns using trial and error and brute force.
Alternatively, it should be possible to figure out what patterns are used by Cutlass and use that, since their kernels do not seem to have this issue.

Topic		Replies	Views
Coalesced memory access example CUDA Programming and Performance	2	3245	March 28, 2011
About coalescing CUDA Programming and Performance	6	2611	April 16, 2010
Accessing Managed Memory During Asynchronous Copies CUDA Programming and Performance	4	335	March 4, 2024
Improving Cuda-kernels performance CUDA Programming and Performance	5	9330	February 10, 2009
Misaligned Data Access Has No Effect on Performance? CUDA Programming and Performance	7	2136	May 24, 2018
Understanding the output of Nsight Systems (CUDA API row vs. rows in CUDA HW section) Profiling x86 Windows Targets cuda	4	563	April 11, 2024
Issue with cooperative_groups::memcpy_async CUDA Programming and Performance	4	2185	November 30, 2021
Visual debugger to see if mem access is coalesced CUDA Programming and Performance	7	1017	November 1, 2011
Coalesced Memory Access to Structs CUDA Programming and Performance	11	4619	September 19, 2009
When reading scattered data for a single warp in CUDA, how can we achieve coalesced memory access? CUDA Programming and Performance	7	419	March 15, 2024

Coalesced and conflict free memory access using cuda::memcpy_async/cp.async

Related topics