Is it faster to use many threads or many blocks. For example if I were doing something 1024 times would it be faster to use 2 blocks with 512 threads or 512 blocks with 2 threads?
It depend in your problem, but you most of the times want to run blocks with a lot of threads.
I don’t know what GPU you are working on but let’s think that your are using a GTX 280, this GPU has 30 Streaming Multiprocessors (SMs) and each multiprocessor consist in 8 Scalar Processors (SP). You can run up to 8 blocks per SMs (that depends on the number of threads per block, if you want to use 512 threads per blocks, you will be able to run only 2 blocks per SMs… just remember that the maximum number of active threads per SMs is 1024).
So the maximum number of blocks that can be concurrently running in your device is (Number_of_SMs * 8), 240 in the the case of the GTX 280… it doesn’t mean that you can’t launch your kernel with more than 240 blocks, you can launch it with thousands of blocks if you want, but your GPU will process them 240 at a time, once that one blocks finish its work, your GPU launch the next block in available SMs.
So the maybe your answer is (as most of the times is) you launch your kernel with blocks of N>1 threads (tip: use multiple of 16 threads) so your process run concurrently in one step, instead of having idle blocks waiting (that is the case when you use 512 blocks with 2 threads each).
Hope it helps!!