how testing a mpi/cuda programs without GPU cluster?,?

i have to write programs in cuda with mpi and implement there in a GPU cluster, but i havent a cluster, so is there a solution like simulator or somthing else?

You can just let multiple instances of the code share the same GPU. While this will not give any indication of performance (it will most likely be slower than running the same problem with just one GPU) you can at least test if it works.