Kepler max effective FMA throughput

I’m trying to benchmark the effective throughput of CUDA GPUs. I’ve written a kernel that executes a long series of independent FMAD instructions in the following pattern (dumped with cuobjdump):

...
        /*00d8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*00e0*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*00e8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*00f0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*00f8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
                                                                                /* 0x2202020200421047 */
        /*0108*/                FFMA.FTZ R4, R21, R21, R21;                     /* 0x302a000055511c40 */
        /*0110*/                FFMA.FTZ R5, R20, R20, R20;                     /* 0x3028000051415c40 */
        /*0118*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0120*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0128*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0130*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0138*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
                                                                                /* 0x2202020200420047 */
        /*0148*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0150*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0158*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0160*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0168*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0170*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0178*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
                                                                                /* 0x2202020200420047 */
        /*0188*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0190*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0198*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*01a0*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*01a8*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*01b0*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*01b8*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
                                                                                /* 0x2202020200420047 */
        /*01c8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*01d0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*01d8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*01e0*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*01e8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*01f0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*01f8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
                                                                                /* 0x2202020200420047 */
        /*0208*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0210*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0218*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0220*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0228*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0230*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0238*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
                                                                                /* 0x2202020200420047 */
        /*0248*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0250*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0258*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0260*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0268*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0270*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0278*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
                                                                                /* 0x2202020200420047 */
        /*0288*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0290*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0298*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*02a0*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*02a8*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*02b0*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*02b8*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
                                                                                /* 0x2202020200420047 */
        /*02c8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*02d0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*02d8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*02e0*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*02e8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*02f0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*02f8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
                                                                                /* 0x2202020200420047 */
        /*0308*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0310*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0318*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0320*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0328*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0330*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0338*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
                                                                                /* 0x2202020200420047 */
        /*0348*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0350*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0358*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0360*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0368*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0370*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0378*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
                                                                                /* 0x2202020200420047 */
        /*0388*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0390*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0398*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*03a0*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*03a8*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*03b0*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*03b8*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
                                                                                /* 0x2202020042004207 */
        /*03c8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*03d0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*03d8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*03e0*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*03e8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*03f0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*03f8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
                                                                                /* 0x2202020200420047 */
        /*0408*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0410*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0418*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0420*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0428*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0430*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0438*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
                                                                                /* 0x2202020042020047 */
        /*0448*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0450*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0458*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0460*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0468*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0470*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0478*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
                                                                                /* 0x2202020200420047 */
        /*0488*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0490*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0498*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*04a0*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*04a8*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*04b0*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*04b8*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
                                                                                /* 0x2202020200420047 */
        /*04c8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*04d0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*04d8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*04e0*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*04e8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*04f0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*04f8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
                                                                                /* 0x2202020200420047 */
        /*0508*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0510*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0518*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0520*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0528*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0530*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0538*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
                                                                                /* 0x2202020200420047 */
        /*0548*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0550*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0558*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0560*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0568*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0570*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0578*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
                                                                                /* 0x2202020200420047 */
        /*0588*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0590*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0598*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*05a0*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*05a8*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*05b0*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*05b8*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
                                                                                /* 0x2202020200420047 */
        /*05c8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*05d0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*05d8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*05e0*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*05e8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*05f0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*05f8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
                                                                                /* 0x2202020200420047 */
        /*0608*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0610*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0618*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0620*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0628*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0630*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0638*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
                                                                                /* 0x2202020200420047 */
        /*0648*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0650*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0658*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0660*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0668*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0670*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0678*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
                                                                                /* 0x2202020200420047 */
        /*0688*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0690*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0698*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*06a0*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*06a8*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*06b0*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*06b8*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
                                                                                /* 0x2202020200420047 */
        /*06c8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*06d0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*06d8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*06e0*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*06e8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*06f0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*06f8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
                                                                                /* 0x2202020200420047 */
        /*0708*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0710*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0718*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0720*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0728*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0730*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0738*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
                                                                                /* 0x2202020200420047 */
        /*0748*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0750*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0758*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0760*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0768*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0770*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0778*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
                                                                                /* 0x2202020200420047 */
        /*0788*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0790*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0798*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*07a0*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*07a8*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*07b0*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*07b8*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
                                                                                /* 0x2202020042004207 */
        /*07c8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*07d0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*07d8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*07e0*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*07e8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*07f0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*07f8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
                                                                                /* 0x2202020200420047 */
        /*0808*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0810*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0818*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0820*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0828*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0830*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0838*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
                                                                                /* 0x2202020042020047 */
        /*0848*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0850*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0858*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0860*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0868*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0870*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0878*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
                                                                                /* 0x2202020200420047 */
        /*0888*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0890*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0898*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*08a0*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*08a8*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*08b0*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*08b8*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
                                                                                /* 0x2202020200420047 */
        /*08c8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*08d0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*08d8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*08e0*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*08e8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*08f0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*08f8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
                                                                                /* 0x2202020200420047 */
        /*0908*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0910*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0918*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0920*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0928*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0930*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0938*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
                                                                                /* 0x2202020200420047 */
        /*0948*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0950*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0958*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0960*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0968*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0970*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0978*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
                                                                                /* 0x2202020200420047 */
        /*0988*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0990*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0998*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*09a0*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*09a8*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*09b0*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*09b8*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
                                                                                /* 0x2202020200420047 */
        /*09c8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*09d0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*09d8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*09e0*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*09e8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*09f0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*09f8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
                                                                                /* 0x2202020200420047 */
        /*0a08*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0a10*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0a18*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0a20*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0a28*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0a30*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0a38*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
                                                                                /* 0x2202020200420047 */
        /*0a48*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0a50*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0a58*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0a60*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0a68*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0a70*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0a78*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
                                                                                /* 0x2202020200420047 */
        /*0a88*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0a90*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0a98*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0aa0*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0aa8*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0ab0*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0ab8*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
                                                                                /* 0x2202020200420047 */
        /*0ac8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0ad0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0ad8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0ae0*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0ae8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0af0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0af8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
                                                                                /* 0x2202020200420047 */
        /*0b08*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0b10*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0b18*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0b20*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0b28*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0b30*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0b38*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
                                                                                /* 0x2202020200420047 */
        /*0b48*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0b50*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0b58*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0b60*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0b68*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0b70*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0b78*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
                                                                                /* 0x2202020200420047 */
        /*0b88*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0b90*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0b98*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0ba0*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0ba8*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0bb0*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0bb8*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
                                                                                /* 0x2202020042004207 */
        /*0bc8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0bd0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0bd8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0be0*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0be8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0bf0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0bf8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
                                                                                /* 0x2202020200420047 */
        /*0c08*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0c10*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0c18*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0c20*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0c28*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0c30*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0c38*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
                                                                                /* 0x2202020042020047 */
        /*0c48*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0c50*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0c58*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0c60*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0c68*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0c70*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0c78*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
                                                                                /* 0x2202020200420047 */
        /*0c88*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0c90*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0c98*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0ca0*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0ca8*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0cb0*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0cb8*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
                                                                                /* 0x2202020200420047 */
        /*0cc8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0cd0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0cd8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0ce0*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0ce8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0cf0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0cf8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
                                                                                /* 0x2202020200420047 */
        /*0d08*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0d10*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0d18*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0d20*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0d28*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0d30*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0d38*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
                                                                                /* 0x2202020200420047 */
        /*0d48*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0d50*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0d58*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0d60*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0d68*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0d70*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0d78*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
                                                                                /* 0x2202020200420047 */
        /*0d88*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0d90*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0d98*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0da0*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0da8*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0db0*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0db8*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
                                                                                /* 0x2202020200420047 */
        /*0dc8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0dd0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0dd8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0de0*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0de8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0df0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0df8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
                                                                                /* 0x2202020200420047 */
        /*0e08*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0e10*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0e18*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0e20*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0e28*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0e30*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0e38*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
                                                                                /* 0x2202020200420047 */
        /*0e48*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0e50*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0e58*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0e60*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0e68*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0e70*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0e78*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
                                                                                /* 0x2202020200420047 */
        /*0e88*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0e90*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0e98*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0ea0*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0ea8*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0eb0*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0eb8*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
                                                                                /* 0x2202020200420047 */
        /*0ec8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0ed0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0ed8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0ee0*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0ee8*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*0ef0*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0ef8*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
                                                                                /* 0x2202020200420047 */
        /*0f08*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0f10*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0f18*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0f20*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0f28*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*0f30*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0f38*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
                                                                                /* 0x2202020200420047 */
        /*0f48*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0f50*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0f58*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0f60*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0f68*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*0f70*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*0f78*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
                                                                                /* 0x2202004220420047 */
        /*0f88*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*0f90*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*0f98*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*0fa0*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*0fa8*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*0fb0*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*0fb8*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
                                                                                /* 0x2200422042004207 */
        /*0fc8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*0fd0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*0fd8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*0fe0*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*0fe8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*0ff0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*0ff8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
                                                                                /* 0x2212004200422047 */
        /*1008*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*1010*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*1018*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*1020*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*1028*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*1030*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*1038*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
                                                                                /* 0x2210421042020047 */
        /*1048*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*1050*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*1058*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*1060*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*1068*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*1070*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*1078*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
                                                                                /* 0x2202004210421047 */
        /*1088*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*1090*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*1098*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*10a0*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*10a8*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*10b0*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*10b8*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
                                                                                /* 0x2210420200421047 */
        /*10c8*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*10d0*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*10d8*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*10e0*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*10e8*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*10f0*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*10f8*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
                                                                                /* 0x2202004210421047 */
        /*1108*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*1110*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*1118*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*1120*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*1128*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*1130*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*1138*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
                                                                                /* 0x2202004210421047 */
        /*1148*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*1150*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*1158*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*1160*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
        /*1168*/                IADD R0, R0, 0x20;                              /* 0x4800c00080001c03 */
        /*1170*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*1178*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
                                                                                /* 0x2202004210421047 */
        /*1188*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*1190*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*1198*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*11a0*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
        /*11a8*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*11b0*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*11b8*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
                                                                                /* 0x2202004210421047 */
        /*11c8*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*11d0*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*11d8*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*11e0*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
        /*11e8*/                FFMA.FTZ R12, R12, R12, R12;                    /* 0x3018000030c31c40 */
        /*11f0*/                FFMA.FTZ R13, R13, R13, R13;                    /* 0x301a000034d35c40 */
        /*11f8*/                FFMA.FTZ R14, R14, R14, R14;                    /* 0x301c000038e39c40 */
                                                                                /* 0x2202004210421047 */
        /*1208*/                ISETP.NE.AND P0, PT, R0, 0x1000, PT;            /* 0x1a8ec0400001dc23 */
        /*1210*/                FFMA.FTZ R15, R15, R15, R15;                    /* 0x301e00003cf3dc40 */
        /*1218*/                FFMA.FTZ R16, R16, R16, R16;                    /* 0x3020000041041c40 */
        /*1220*/                FFMA.FTZ R17, R17, R17, R17;                    /* 0x3022000045145c40 */
        /*1228*/                FFMA.FTZ R19, R19, R19, R19;                    /* 0x302600004d34dc40 */
        /*1230*/                FFMA.FTZ R18, R18, R18, R18;                    /* 0x3024000049249c40 */
        /*1238*/                FFMA.FTZ R4, R4, R4, R4;                        /* 0x3008000010411c40 */
                                                                                /* 0x2202004210421047 */
        /*1248*/                FFMA.FTZ R5, R5, R5, R5;                        /* 0x300a000014515c40 */
        /*1250*/                FFMA.FTZ R6, R6, R6, R6;                        /* 0x300c000018619c40 */
        /*1258*/                FFMA.FTZ R7, R7, R7, R7;                        /* 0x300e00001c71dc40 */
        /*1260*/                FFMA.FTZ R8, R8, R8, R8;                        /* 0x3010000020821c40 */
        /*1268*/                FFMA.FTZ R9, R9, R9, R9;                        /* 0x3012000024925c40 */
        /*1270*/                FFMA.FTZ R10, R10, R10, R10;                    /* 0x3014000028a29c40 */
        /*1278*/                FFMA.FTZ R11, R11, R11, R11;                    /* 0x301600002cb2dc40 */
                                                                          
...

When I try it on Fermi GPUs the results seem to be very close to theoretical numbers. However, when I try it on Kepler GPUs they seem to be lower than expected. This fact is especially evident on a Tesla K40m on which it evaluates to 3,014.91GFLOPS (peak:4,291 GFLOPS, 70% efficiency). The same kernel adapted for DP gives 1373.83GFLOPS which is somehow much closer to the theoretical peak (1430 GFLOPS, 96% efficiency).
Am I missing something about Kepler?

ekon,

Would you be willing to share your test code? I would be interested in poking at this a bit as well. I don’t have any good theories right now for why you do not see closer-to-peak performance.

Well, I guess this has to do with the undocumented register bank conflicts which are evident on Kepler. You can find more information in a paper about Matrix Multiplication which investigates this issue:

http://hal.archives-ouvertes.fr/docs/00/78/99/58/PDF/112_Lai.pdf

From all I know so far, the third set of FPUs/“cores” per SMX on Kepler appears register bandwidth starved. I’ve observed similar problems as you and as laid out in the paper you cited.

I wonder if this had anything to do with Maxwell reducing the SM “core count” from 192 to 128…

http://docs.nvidia.com/cuda/maxwell-tuning-guide/index.html#axzz30Z3oclDC

:-)

I believe it has a lot to do with it. Never understood why Nvidia went for 192 in the first place - they should have known from Fermi experience (but it might have been too late for a redesign by then).