Its SASS, so its not well documented. However you can get some insight by studying the corresponding PTX instruction. This may be of interest.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Is there a document about in which hardware unit(ie. ALU FMU...) an instruction is executed? | 35 | 3767 | October 5, 2022 | |
| Inst_fp_32 and inst_fp_64 metrics | 9 | 1727 | April 7, 2018 | |
| Reverse LUT for LOP3.LUT | 5 | 2956 | December 30, 2023 | |
| LOP3 Throughput | 1 | 1446 | July 26, 2019 | |
| Measurements of different CUDA operator throughputs | 32 | 50355 | August 24, 2009 | |
| Integer MAD instruction | 11 | 17868 | October 22, 2010 | |
| So what's new about Maxwell? | 166 | 58019 | March 10, 2015 | |
| long-integer multiplication: mul.wide.u64 and mul.wide.u128 | 31 | 8265 | January 2, 2018 | |
| 32-bit number multiplication | 23 | 20843 | July 1, 2012 | |
| Looking for logical compute ceiling Found magic CUDA optimizations | 7 | 3045 | February 2, 2010 |