How to control the SMs number during kernel execution?

I mean specifying the SMs number used during kernel execution? for exmample, there are 16 SMs on GPU, i only use 2 of them.

There isn’t a programmatic interface that will let you do that. You can use the occupancy calculator and data from ptxas to calculate how many active blocks per SM will run , and then launch twice that number. It severely restricts the total amount of work your kernel will do, and leaves most of your GPU idle, but it can be achieved.

You can’t yet. The kernal will always use as many SMs as it can.

When Fermi comes out, this will change (provided that you’re running on Fermi hardware at least), but the API specifics for how to define this are still more or less under wraps, though you might find something if you dig through the 3.0 beta toolkit.