SeqData and Multi-head Attention


In cudnnSetAttnDescriptor() and the fwd, bwd weight, bwd data APIs.

(1) What are the Q K V input data layouts required in the global memory if the corresponding Q, K, and V projection size is set to zero (the input is the data output by the specific linear layer)? To be specific, does vec_dim have the size [numHeads*headDim] with headDim continuous in the global memory?

(2) Why oSize equals to numHeads*vSize if both the linear layers for V and O are not included? This requirement seems to consume more global memory than normal attentions do (more precisely, numHeads times more).

HI @jundaf3 ,
Below link should be able to assist you better.