Incomplete documentation of cudnnMultiHeadAttnForward


I’m interested in using cudnnMultiHeadAttnForward for inference but I find the documentation lacking.

Unless I missed it somewhere else, the documentation does not describe the layout of the weight buffer w. Could you describe how to build this argument?

Also, does the function support biases which are commonly used for this layer implementation?

On a related point, the documentation says that currIdx should be >= 0 in inference while the release note for 7.5.1 says that it can be negative.



I have the same request :

  • the doc of cudnnSetAttnDescriptor() does not explain the expected params
  • there is no example of MHA in the cudnn samples package

Looks like this has been addressed in the latest version: