Hi,
I’m interested in using cudnnMultiHeadAttnForward for inference but I find the documentation lacking.
Unless I missed it somewhere else, the documentation does not describe the layout of the weight buffer w. Could you describe how to build this argument?
Also, does the function support biases which are commonly used for this layer implementation?
On a related point, the documentation says that currIdx should be >= 0 in inference while the release note for 7.5.1 says that it can be negative.
Thanks,
Guillaume