SeqData and Multi-head Attention

jundaf3 · April 13, 2023, 6:17pm

Hi,

In cudnnSetAttnDescriptor() and the fwd, bwd weight, bwd data APIs.

(1) What are the Q K V input data layouts required in the global memory if the corresponding Q, K, and V projection size is set to zero (the input is the data output by the specific linear layer)? To be specific, does vec_dim have the size [numHeads*headDim] with headDim continuous in the global memory?

(2) Why oSize equals to numHeads*vSize if both the linear layers for V and O are not included? This requirement seems to consume more global memory than normal attentions do (more precisely, numHeads times more).

AakankshaS · May 31, 2023, 7:46am

HI @jundaf3 ,
Below link should be able to assist you better.
https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetAttnDescriptor

Thanks