gemm get wrong result when run on sm70

Implement gemm for tensor core on Turing, and compiled with sm75, it outputs the right result.

But, when compiled with sm70 and still run on T4 or TitanV, the same code output wrong result. It seems of sm70 computes incorrectly. And I firstly the result C to shared memory, and then print the first tile(16, 16) of C of the gpu version and corresponding cpu version(right version.), the outputs are as below :
----gpu result----
131.625000 129.375000 135.500000 132.000000 120.625000 131.375000 123.062500 134.625000 133.000000 126.437500 141.375000 128.250000 133.875000 126.312500 137.000000 133.500000
127.812500 123.937500 134.125000 132.375000 119.812500 127.500000 126.062500 131.000000 129.625000 121.187500 137.000000 128.250000 127.562500 124.562500 135.375000 130.125000
124.812500 123.937500 126.625000 126.875000 120.187500 125.750000 125.187500 130.125000 129.375000 120.250000 129.875000 124.437500 131.500000 125.312500 134.250000 124.062500
122.125000 122.062500 125.937500 128.250000 117.500000 120.312500 122.437500 132.375000 122.562500 120.375000 130.375000 123.125000 122.187500 114.875000 132.750000 125.000000
129.250000 125.937500 124.812500 119.625000 119.750000 127.250000 115.437500 123.562500 132.750000 129.250000 128.375000 115.812500 130.750000 127.625000 130.000000 124.562500
135.125000 127.750000 137.625000 132.250000 127.250000 133.625000 123.125000 132.500000 135.875000 129.875000 139.375000 129.125000 136.750000 132.625000 135.125000 131.125000
119.125000 122.062500 117.000000 121.937500 120.750000 124.875000 115.750000 122.437500 129.625000 118.687500 117.750000 117.187500 126.250000 118.937500 126.375000 115.875000
123.625000 125.250000 124.812500 126.812500 127.625000 129.375000 119.562500 130.375000 128.500000 124.312500 132.250000 126.562500 132.500000 122.500000 130.500000 123.687500
129.875000 126.062500 132.625000 126.812500 118.437500 124.937500 119.437500 131.500000 126.875000 117.187500 136.000000 125.062500 133.000000 124.937500 130.875000 125.937500
131.875000 132.500000 139.250000 131.125000 126.875000 131.375000 129.625000 135.750000 137.500000 126.375000 141.375000 130.375000 133.500000 134.750000 141.250000 136.500000
118.187500 123.375000 121.437500 123.000000 114.250000 126.250000 123.062500 124.125000 123.625000 116.687500 121.812500 121.500000 126.000000 121.750000 126.937500 121.562500
127.562500 129.250000 129.375000 133.000000 123.312500 130.375000 126.375000 132.125000 129.250000 125.812500 134.250000 126.687500 131.500000 123.250000 138.000000 128.000000
133.750000 130.375000 124.625000 122.750000 122.875000 130.000000 116.625000 125.437500 136.000000 128.250000 131.750000 117.812500 137.375000 127.312500 128.750000 125.000000
130.000000 120.000000 135.625000 129.750000 116.187500 120.312500 123.500000 134.750000 125.062500 117.750000 135.625000 127.125000 127.812500 123.437500 136.375000 132.125000
123.437500 123.937500 114.437500 118.062500 125.250000 128.750000 117.937500 117.812500 128.125000 125.000000 122.062500 112.500000 130.750000 122.437500 122.375000 118.500000
118.375000 119.000000 126.625000 129.250000 114.687500 119.625000 124.937500 124.375000 121.062500 111.812500 130.875000 122.000000 125.187500 118.750000 132.625000 125.500000

— cpu results—
131.625000 129.375000 124.812500 123.937500 120.625000 131.375000 120.187500 125.750000 133.000000 126.437500 129.375000 120.250000 133.875000 126.312500 131.500000 125.312500
127.812500 123.937500 122.125000 122.062500 119.812500 127.500000 117.500000 120.312500 129.625000 121.187500 122.562500 120.375000 127.625000 124.562500 122.187500 114.875000
135.500000 132.000000 126.625000 126.875000 123.062500 134.625000 125.187500 130.125000 141.375000 128.250000 129.875000 124.437500 137.000000 133.500000 134.250000 124.062500
134.125000 132.375000 125.937500 128.250000 126.062500 131.000000 122.437500 132.375000 137.000000 128.250000 130.375000 123.125000 135.375000 130.125000 132.750000 125.000000
129.250000 125.937500 119.125000 122.062500 119.750000 127.250000 120.750000 124.875000 132.750000 129.250000 129.750000 118.687500 130.750000 127.625000 126.250000 118.937500
135.125000 127.750000 123.625000 125.250000 127.250000 133.625000 127.625000 129.375000 135.875000 129.875000 128.500000 124.312500 136.750000 132.625000 132.500000 122.500000
124.812500 119.625000 117.000000 121.937500 115.437500 123.562500 115.750000 122.437500 128.375000 115.812500 117.750000 117.187500 130.000000 124.562500 126.375000 115.875000
137.625000 132.250000 124.812500 126.812500 123.125000 132.500000 119.562500 130.375000 139.375000 129.125000 132.250000 126.562500 135.125000 131.125000 130.500000 123.687500
129.875000 126.062500 118.187500 123.375000 118.437500 124.937500 114.250000 126.250000 126.875000 117.187500 123.625000 116.687500 133.000000 124.937500 126.000000 121.750000
131.875000 132.500000 127.562500 129.250000 126.875000 131.375000 123.312500 130.375000 137.500000 126.375000 129.250000 125.812500 133.500000 134.750000 131.500000 123.250000
132.625000 126.812500 121.437500 123.000000 119.437500 131.500000 123.062500 124.125000 136.000000 125.062500 121.812500 121.500000 130.875000 125.937500 126.937500 121.562500
139.250000 131.125000 129.375000 133.000000 129.625000 135.750000 126.375000 132.125000 141.375000 130.375000 134.250000 126.687500 141.250000 136.500000 138.000000 128.000000
133.750000 130.375000 123.437500 123.937500 122.875000 130.000000 125.250000 128.750000 136.000000 128.250000 128.125000 125.000000 137.375000 127.312500 130.750000 122.437500
130.000000 120.000000 118.375000 119.000000 116.187500 120.312500 114.687500 119.625000 125.062500 117.750000 121.062500 111.812500 127.812500 123.437500 125.187500 118.750000
124.625000 122.750000 114.437500 118.062500 116.625000 125.437500 117.937500 117.812500 131.750000 117.875000 122.062500 112.500000 128.750000 125.000000 122.375000 118.500000
135.625000 129.750000 126.625000 129.250000 123.500000 134.750000 124.937500 124.375000 135.625000 127.125000 130.875000 122.000000 136.375000 132.125000 132.625000 125.500000

Compare these two results, I find the gpu version seems output in the wrong order, as the graph attached.