Multigpu cuda program getting stuck

I work on a remote machine which has a Tesla K20c and GeForce GTX 560 Ti. I want to try Simple multi gpu given in Cuda Samples. However, the program is getting stuck for infinity without showing any sort of error message. The position where it gets stuck is at the cudaSetDevice call. This is happening with my own multi gpu program as well.

The problem is not with cudaSetDevice(), as when I ran the same program on a machine which had single gpu it worked fine. Also, On this dual gpu machine as well, it worked fine for some time. But now its getting stuck.

What can be the reasons? Network? or gpus getting used by other user, I checked many times I am the only user logged in on that machine.

Edit: I meant cudaSetDevice() is not able to finish. I am using ssh to connect to remote host.

Please note that the same program ran successfully for some instances, but now its getting stuck.