From PTX, at least my understanding, src can only be global or shared::cta, we can not reduce from different dsm to global, right?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
shared memory, texture memory, arrays, etc. clarification? | 1 | 2316 | January 31, 2008 | |
How to Use cp.reduce.async.bulk to Perform Block-Level Reduction to Global Memory? | 0 | 47 | September 19, 2024 | |
PTX | 0 | 2040 | December 6, 2007 | |
CUDA PTX cp.async only supports global to shared memory copy | 2 | 1071 | March 14, 2023 | |
How Does TMA Work for Writing from Shared Memory (sS) to Global Memory (gD)? | 0 | 24 | December 23, 2024 | |
Global memory to shared memory without passing registers | 1 | 513 | February 3, 2021 | |
Can we directly use register value for tensor core calculation? | 4 | 560 | October 18, 2023 | |
Can TMA Expect_tx Sync Multiple Transfers to Different SMEM Addresses in One Block? | 0 | 18 | November 23, 2024 | |
Request for clarification on prefetch.global.tensormap docs | 0 | 101 | June 26, 2024 | |
Using TMA, how to write shared-to-shared data moving? | 0 | 21 | July 27, 2024 |