Buying GPUs for CUDA simulations

I am in a situation where I have $4000 to spend on GPUs. I want to run CUDA monte carlo simulations. The kernel is pretty small and as a result, I have no preference for # of GPUs. I just want the setup that will run the most simulations/second.

Note** Since the kernels are so small, I will require multiple streams on each GPU. If I understand correctly, compute 3.5 devices are really good at streams due to Hyper Q technology. Just something to consider.

Thank you in advance for your recommendation.

You have a great number of options:

If it will be under constant use and you value reliability,double precision,2 copy engines,long product lifecycle, a great dedicated Windows driver and need 12GB of memory, a single Tesla K40 will do the trick.

Or you get a few GTX Titans which have double precision settings are are also considered reliable.

If you need pure 32-bit Gflops then get a bunch of GTX 780ti(s)… less memory more flops.

Thank you kind sir.