Its SASS, so its not well documented. However you can get some insight by studying the corresponding PTX instruction. This may be of interest.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Measurements of different CUDA operator throughputs | 32 | 50149 | August 24, 2009 | |
| Looking for logical compute ceiling Found magic CUDA optimizations | 7 | 3000 | February 2, 2010 | |
| Integer MAD instruction | 11 | 17811 | October 22, 2010 | |
| 32-bit number multiplication | 23 | 20728 | July 1, 2012 | |
| Arithmetic Operations benchmarking with CUDA FERMI Understanding pure performance of arithmetic on F | 9 | 1717 | October 27, 2010 | |
| error in modulo operation | 12 | 16210 | September 20, 2009 | |
| Openness about 'real' cubin instructions | 27 | 20507 | April 29, 2009 | |
| Inst_fp_32 and inst_fp_64 metrics | 9 | 1677 | April 7, 2018 | |
| Examining the generated .ptx file | 13 | 2506 | October 24, 2014 | |
| Pipelined Loads | 54 | 7483 | September 21, 2010 |