I am in a situation where I have $4000 to spend on GPUs. I want to run CUDA monte carlo simulations. The kernel is pretty small and as a result, I have no preference for # of GPUs. I just want the setup that will run the most simulations/second.
Note** Since the kernels are so small, I will require multiple streams on each GPU. If I understand correctly, compute 3.5 devices are really good at streams due to Hyper Q technology. Just something to consider.
Thank you in advance for your recommendation.