On all cards other than Fermi, a streaming multiprocessor is a cluster of 8 scalar cores sharing a number of resources (like register file, shared memory, instruction scheduler, etc). SMs are really the basic execution unit in NVIDIA hardware, which is what the API function you are calling is telling you, and the reason for the factor of 8 discrepancy.