I don’t think you want either.
Firstly, you want at least enough blocks to fill your 30 MP’s, so that’s at least 30 blocks to start with. Then you want to look at your register & shared memory allocation - it may be that you can’t get 1024 threads on a MP. This is not a problem - you get pretty much optimial performance with only 512.
There are quite a few other things to take into account. In my application more threads per block are more efficient as they reduce global memory loads, however I have to sync my blocks quite regularily, so smaller ones are better to reduce waiting. I compromise at 128 threads per block.
Also - you can only have 8 blocks per MP. Can be an issue if you want 32 thread blocks. Given resources are allocated in lumps of 64, 32 thread blocks are wasteful in that respect too (as are 96 thread blocks and other such numbers).
I think the general method is to play around and see what works best.