As per CUDA Binary Utilities 12.2 documentation (nvidia.com) there is a new source/destination for sm_90 architecture called “memory descriptors”.
In the sass I could see them as desc[urx][ury]
Could someone give more information on what it means?
As per CUDA Binary Utilities 12.2 documentation (nvidia.com) there is a new source/destination for sm_90 architecture called “memory descriptors”.
In the sass I could see them as desc[urx][ury]
Could someone give more information on what it means?
AFAIK memory descriptors are used to define memory regions of interest for the Hopper TMA. I don’t think they are exposed in PTX or CUDA C++. I don’t have any further description or information.
My understanding of the Hopper TMA is that it is basically an asynchronous DMA engine for moving data around the GPU. DMA engines are commonly configured with descriptors that govern the details of transfers.
Given historical precedence it seems unlikely that NVIDIA will publicly document the details of the descriptors. Some details may become apparent when load descriptor instructions are spotted in the wild, that is, disassembled code for the Hopper architecture. Apparently NVIDIA has a patent application on this mechanism, which may or may not contain some details (throughout the course of my employment in the computer industry I was told repeatedly that reading patents or patent applications is not a good idea).
From here:
“TMA operations are launched using a copy descriptor that specifies data transfers using tensor dimensions and block coordinates instead of per-element addressing (Figure 15)…”
I can find the SM_86’s SASS code will also generate desc, but the Ampere doesn’t support the TMA…
/*0000*/ MOV R1, c[0x0][0x28] ;
/*0010*/ S2R R2, SR_CTAID.X ;
/*0020*/ MOV R9, 0x4 ;
/*0030*/ ULDC.64 UR4, c[0x0][0x118] ;
/*0040*/ S2R R3, SR_TID.X ;
/*0050*/ IMAD R2, R2, c[0x0][0x0], R3 ;
/*0060*/ IMAD.WIDE R6, R2, R9, c[0x0][0x170] ;
/*0070*/ IMAD.WIDE R8, R2, R9, c[0x0][0x178] ;
/*0080*/ LDG.E R3, desc[UR4][R6.64] ;
/*0090*/ LDG.E R4, desc[UR4][R8.64] ;
/*00a0*/ SHF.R.S32.HI R5, RZ, 0x1f, R2 ;
/*00b0*/ BAR.SYNC.DEFER_BLOCKING 0x0 ;
/*00c0*/ CS2R.32 R0, SR_CLOCKLO ;
/*00d0*/ MOV R8, RZ ;
/*00e0*/ MOV R6, RZ ;
They seem to have been introduced for normal memory operations as early as compute capability 8.0.
Could it be related to NvSciBuf?