Sum reduction working in Fermi, Kepler and Maxwell

But then according to

Based on the size of the array with values to be summed, a determination is made regarding the number of strided values (elements in the input array) each thread in a block will load. In this case for Kepler/Maxwell 256 threads, 64 in the x dimension and 4 in the y dimension, so NbxGroups refers the number of x thread blocks of size 64.

The code would be static and won’t work on Fermi?