We have built an open-source SASS assembler for Volta and Turing GPUs.
You can access the tool at:
Our sincere hope is that this tool can help people who want to achieve bare-metal performance on Nvidia GPUs.
With the help of this tool, we came up with two performance optimization works.
The first one is about optimizing Wingorad Convolution on Volta and Turing GPUs (following Andrew Lavin and Scott Grey’s great work on Maxwell). This work has been accepted in PPoPP’20 (San Diego, CA, 22-26 Feb) and I will do the presentation there. Hope I can meet you in San Diego!
The second work is about optimizing Tensor Core-based HGEMM on Turing GPUs. This work has been accepted in IPDPS’20 (New Orleans, LA, 18-22 May). I also hope I can meet people with zeal about GPUs there!
I believe these two works can help Nvidia library writers and inspire more works in this area.
Looking forward to your comments :P