I’m trying to benchmark the effective throughput of CUDA GPUs. I’ve written a kernel that executes a long series of independent FMAD instructions in the following pattern (dumped with cuobjdump):
...
/*00d8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*00e0*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*00e8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*00f0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*00f8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/* 0x2202020200421047 */
/*0108*/ FFMA.FTZ R4, R21, R21, R21; /* 0x302a000055511c40 */
/*0110*/ FFMA.FTZ R5, R20, R20, R20; /* 0x3028000051415c40 */
/*0118*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0120*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0128*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0130*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0138*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/* 0x2202020200420047 */
/*0148*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0150*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0158*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0160*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0168*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0170*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0178*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/* 0x2202020200420047 */
/*0188*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0190*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0198*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*01a0*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*01a8*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*01b0*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*01b8*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/* 0x2202020200420047 */
/*01c8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*01d0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*01d8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*01e0*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*01e8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*01f0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*01f8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/* 0x2202020200420047 */
/*0208*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0210*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0218*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0220*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0228*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0230*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0238*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/* 0x2202020200420047 */
/*0248*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0250*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0258*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0260*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0268*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0270*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0278*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/* 0x2202020200420047 */
/*0288*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0290*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0298*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*02a0*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*02a8*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*02b0*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*02b8*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/* 0x2202020200420047 */
/*02c8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*02d0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*02d8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*02e0*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*02e8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*02f0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*02f8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/* 0x2202020200420047 */
/*0308*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0310*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0318*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0320*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0328*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0330*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0338*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/* 0x2202020200420047 */
/*0348*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0350*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0358*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0360*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0368*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0370*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0378*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/* 0x2202020200420047 */
/*0388*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0390*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0398*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*03a0*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*03a8*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*03b0*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*03b8*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/* 0x2202020042004207 */
/*03c8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*03d0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*03d8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*03e0*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*03e8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*03f0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*03f8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/* 0x2202020200420047 */
/*0408*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0410*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0418*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0420*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0428*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0430*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0438*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/* 0x2202020042020047 */
/*0448*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0450*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0458*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0460*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0468*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0470*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0478*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/* 0x2202020200420047 */
/*0488*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0490*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0498*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*04a0*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*04a8*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*04b0*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*04b8*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/* 0x2202020200420047 */
/*04c8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*04d0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*04d8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*04e0*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*04e8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*04f0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*04f8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/* 0x2202020200420047 */
/*0508*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0510*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0518*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0520*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0528*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0530*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0538*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/* 0x2202020200420047 */
/*0548*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0550*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0558*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0560*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0568*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0570*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0578*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/* 0x2202020200420047 */
/*0588*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0590*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0598*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*05a0*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*05a8*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*05b0*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*05b8*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/* 0x2202020200420047 */
/*05c8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*05d0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*05d8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*05e0*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*05e8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*05f0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*05f8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/* 0x2202020200420047 */
/*0608*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0610*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0618*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0620*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0628*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0630*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0638*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/* 0x2202020200420047 */
/*0648*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0650*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0658*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0660*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0668*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0670*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0678*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/* 0x2202020200420047 */
/*0688*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0690*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0698*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*06a0*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*06a8*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*06b0*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*06b8*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/* 0x2202020200420047 */
/*06c8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*06d0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*06d8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*06e0*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*06e8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*06f0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*06f8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/* 0x2202020200420047 */
/*0708*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0710*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0718*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0720*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0728*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0730*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0738*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/* 0x2202020200420047 */
/*0748*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0750*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0758*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0760*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0768*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0770*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0778*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/* 0x2202020200420047 */
/*0788*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0790*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0798*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*07a0*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*07a8*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*07b0*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*07b8*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/* 0x2202020042004207 */
/*07c8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*07d0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*07d8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*07e0*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*07e8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*07f0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*07f8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/* 0x2202020200420047 */
/*0808*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0810*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0818*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0820*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0828*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0830*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0838*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/* 0x2202020042020047 */
/*0848*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0850*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0858*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0860*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0868*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0870*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0878*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/* 0x2202020200420047 */
/*0888*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0890*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0898*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*08a0*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*08a8*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*08b0*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*08b8*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/* 0x2202020200420047 */
/*08c8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*08d0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*08d8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*08e0*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*08e8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*08f0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*08f8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/* 0x2202020200420047 */
/*0908*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0910*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0918*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0920*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0928*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0930*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0938*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/* 0x2202020200420047 */
/*0948*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0950*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0958*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0960*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0968*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0970*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0978*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/* 0x2202020200420047 */
/*0988*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0990*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0998*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*09a0*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*09a8*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*09b0*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*09b8*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/* 0x2202020200420047 */
/*09c8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*09d0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*09d8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*09e0*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*09e8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*09f0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*09f8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/* 0x2202020200420047 */
/*0a08*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0a10*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0a18*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0a20*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0a28*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0a30*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0a38*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/* 0x2202020200420047 */
/*0a48*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0a50*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0a58*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0a60*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0a68*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0a70*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0a78*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/* 0x2202020200420047 */
/*0a88*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0a90*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0a98*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0aa0*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0aa8*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0ab0*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0ab8*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/* 0x2202020200420047 */
/*0ac8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0ad0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0ad8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0ae0*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0ae8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0af0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0af8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/* 0x2202020200420047 */
/*0b08*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0b10*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0b18*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0b20*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0b28*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0b30*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0b38*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/* 0x2202020200420047 */
/*0b48*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0b50*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0b58*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0b60*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0b68*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0b70*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0b78*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/* 0x2202020200420047 */
/*0b88*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0b90*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0b98*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0ba0*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0ba8*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0bb0*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0bb8*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/* 0x2202020042004207 */
/*0bc8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0bd0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0bd8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0be0*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0be8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0bf0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0bf8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/* 0x2202020200420047 */
/*0c08*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0c10*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0c18*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0c20*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0c28*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0c30*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0c38*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/* 0x2202020042020047 */
/*0c48*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0c50*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0c58*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0c60*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0c68*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0c70*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0c78*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/* 0x2202020200420047 */
/*0c88*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0c90*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0c98*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0ca0*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0ca8*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0cb0*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0cb8*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/* 0x2202020200420047 */
/*0cc8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0cd0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0cd8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0ce0*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0ce8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0cf0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0cf8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/* 0x2202020200420047 */
/*0d08*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0d10*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0d18*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0d20*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0d28*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0d30*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0d38*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/* 0x2202020200420047 */
/*0d48*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0d50*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0d58*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0d60*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0d68*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0d70*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0d78*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/* 0x2202020200420047 */
/*0d88*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0d90*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0d98*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0da0*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0da8*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0db0*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0db8*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/* 0x2202020200420047 */
/*0dc8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0dd0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0dd8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0de0*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0de8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0df0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0df8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/* 0x2202020200420047 */
/*0e08*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0e10*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0e18*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0e20*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0e28*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0e30*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0e38*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/* 0x2202020200420047 */
/*0e48*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0e50*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0e58*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0e60*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0e68*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0e70*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0e78*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/* 0x2202020200420047 */
/*0e88*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0e90*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0e98*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0ea0*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0ea8*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0eb0*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0eb8*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/* 0x2202020200420047 */
/*0ec8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0ed0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0ed8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0ee0*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0ee8*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*0ef0*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0ef8*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/* 0x2202020200420047 */
/*0f08*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0f10*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0f18*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0f20*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0f28*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*0f30*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0f38*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/* 0x2202020200420047 */
/*0f48*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0f50*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0f58*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0f60*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0f68*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*0f70*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*0f78*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/* 0x2202004220420047 */
/*0f88*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*0f90*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*0f98*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*0fa0*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*0fa8*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*0fb0*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*0fb8*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/* 0x2200422042004207 */
/*0fc8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*0fd0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*0fd8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*0fe0*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*0fe8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*0ff0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*0ff8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/* 0x2212004200422047 */
/*1008*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*1010*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*1018*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*1020*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*1028*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*1030*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*1038*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/* 0x2210421042020047 */
/*1048*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*1050*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*1058*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*1060*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*1068*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*1070*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*1078*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/* 0x2202004210421047 */
/*1088*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*1090*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*1098*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*10a0*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*10a8*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*10b0*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*10b8*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/* 0x2210420200421047 */
/*10c8*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*10d0*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*10d8*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*10e0*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*10e8*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*10f0*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*10f8*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/* 0x2202004210421047 */
/*1108*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*1110*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*1118*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*1120*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*1128*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*1130*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*1138*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/* 0x2202004210421047 */
/*1148*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*1150*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*1158*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*1160*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/*1168*/ IADD R0, R0, 0x20; /* 0x4800c00080001c03 */
/*1170*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*1178*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/* 0x2202004210421047 */
/*1188*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*1190*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*1198*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*11a0*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/*11a8*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*11b0*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*11b8*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/* 0x2202004210421047 */
/*11c8*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*11d0*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*11d8*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*11e0*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
/*11e8*/ FFMA.FTZ R12, R12, R12, R12; /* 0x3018000030c31c40 */
/*11f0*/ FFMA.FTZ R13, R13, R13, R13; /* 0x301a000034d35c40 */
/*11f8*/ FFMA.FTZ R14, R14, R14, R14; /* 0x301c000038e39c40 */
/* 0x2202004210421047 */
/*1208*/ ISETP.NE.AND P0, PT, R0, 0x1000, PT; /* 0x1a8ec0400001dc23 */
/*1210*/ FFMA.FTZ R15, R15, R15, R15; /* 0x301e00003cf3dc40 */
/*1218*/ FFMA.FTZ R16, R16, R16, R16; /* 0x3020000041041c40 */
/*1220*/ FFMA.FTZ R17, R17, R17, R17; /* 0x3022000045145c40 */
/*1228*/ FFMA.FTZ R19, R19, R19, R19; /* 0x302600004d34dc40 */
/*1230*/ FFMA.FTZ R18, R18, R18, R18; /* 0x3024000049249c40 */
/*1238*/ FFMA.FTZ R4, R4, R4, R4; /* 0x3008000010411c40 */
/* 0x2202004210421047 */
/*1248*/ FFMA.FTZ R5, R5, R5, R5; /* 0x300a000014515c40 */
/*1250*/ FFMA.FTZ R6, R6, R6, R6; /* 0x300c000018619c40 */
/*1258*/ FFMA.FTZ R7, R7, R7, R7; /* 0x300e00001c71dc40 */
/*1260*/ FFMA.FTZ R8, R8, R8, R8; /* 0x3010000020821c40 */
/*1268*/ FFMA.FTZ R9, R9, R9, R9; /* 0x3012000024925c40 */
/*1270*/ FFMA.FTZ R10, R10, R10, R10; /* 0x3014000028a29c40 */
/*1278*/ FFMA.FTZ R11, R11, R11, R11; /* 0x301600002cb2dc40 */
...
When I try it on Fermi GPUs the results seem to be very close to theoretical numbers. However, when I try it on Kepler GPUs they seem to be lower than expected. This fact is especially evident on a Tesla K40m on which it evaluates to 3,014.91GFLOPS (peak:4,291 GFLOPS, 70% efficiency). The same kernel adapted for DP gives 1373.83GFLOPS which is somehow much closer to the theoretical peak (1430 GFLOPS, 96% efficiency).
Am I missing something about Kepler?