I am trying to run a program on my cluster’s GPU partition (we are using Tesla P100s). The program iteratively accesses GPU memory (although we realize this is not ideal, we do not have CUDA expertise and there are some calculations we need to run on our CPUs). It seems that every time the program tries to access GPU memory, there is a huge startup lag for the card. Our non-GPU version of the program runs much more quickly.
What is the likely problem here?