How Does TMA Work for Writing from Shared Memory (sS) to Global Memory (gD)?

Accelerated Computing CUDA CUDA Programming and Performance

202476410arsmart December 23, 2024, 7:12am 1

Could you explain how TMA works? For example, when we write from the shared memory Tensor sS to the global memory Tensor gD, it seems like the data is written sequentially, i.e., sS[i] directly maps to gD[i]. Is this correct?

Topic		Replies	Views
Loading global memory values into shared memory CUDA Programming and Performance	2	869	April 19, 2013
Using TMA, how to write shared-to-shared data moving? CUDA Programming and Performance	0	24	July 27, 2024
Threads in global functions, write on shared memory CUDA Programming and Performance	0	820	March 5, 2009
Shared -> Global Memory CUDA Programming and Performance	1	1122	November 6, 2008
Transfer back (on device) to global memory CUDA Programming and Performance	1	1471	September 20, 2008
Can we directly use register value for tensor core calculation? CUDA Programming and Performance	4	580	October 18, 2023
shared memory latency CUDA Programming and Performance	7	5913	May 18, 2011
Global memory to shared memory without passing registers CUDA-GDB	1	513	February 3, 2021
Writing to Global Memory CUDA Programming and Performance	0	2403	February 13, 2009
TMA async bulk tensor copy memory consistency CUDA Programming and Performance	0	697	April 25, 2024

How Does TMA Work for Writing from Shared Memory (sS) to Global Memory (gD)?

Related topics