How to optimize threads and blocks relating to warps and SM

I am using gpu GT240.
I want to know how to optimize threads and blocks relating to warps and SM ?

you can enable cuda compute profile,
to see how many blocks and threads cublas used.