Multi-head attention performance

user102255 · August 11, 2022, 8:57am

According to this (*) paper, “cuDNN’s performance is orders of magnitude worse” compared to e.g. Pytorch or TensorFlow. This statement refers to version 7.6.5 and is hopefully outdated, as there were major improvements announced with cuDNN 8.3.0. I would like to know if there are any current benchmarks/comparisons with version 8.3 or later with respect to multi-head attention?

(*) http://www.unixer.de/publications/img/data_movement_is_all_you_need.pdf

spolisetty · August 12, 2022, 2:21pm

Hi,

I think we do not have the latest Benchmarks on multi-head attention, will check and get back to you if any.
For the latest changes in the cuDNN, please refer release notes.

Thank you.

Topic		Replies	Views
MultiHeadAttn cuDNN	1	655	August 10, 2022
Upgrade to the newest versions of NVIDIA CUDA-X libraries Technical Blog	0	298	August 21, 2022
cuDNN v2: Higher Performance for Deep Learning on GPUs Technical Blog	2	536	November 18, 2015
Accelerating Transformers with NVIDIA cuDNN 9 Technical Blog cudnn	1	275	January 12, 2025
MultiHeadAttnBackwardData Wrong Result with postDropout enabled cuDNN	1	957	July 8, 2022
SeqDataDesc and MultiHeadAttn Parameters cuDNN cuda	2	922	July 13, 2022
cuDNN 8.x.x vs cuDNN 7.6.5 performance drop cuDNN performance	7	1926	August 26, 2021
How concat weights for cudnnMultiHeadAttnForward dw cuDNN	16	2048	December 15, 2021
Just Released: NVIDIA cuDNN 9.7 Technical Blog cudnn	0	142	January 31, 2025
Performant FMHA kernels in CuDNN cuDNN	3	1056	May 20, 2024

Multi-head attention performance

Related topics