I noticed that if I run a CUDA program on a GPU card that has been sitting idle for a relatively long time (6+ hours), the wall time is slower than some of the other runs.
- Is this true and if so, what is the reason? 2) If this is is true, then would it be better to run a smaller jobs initially on an idle GPU to let it warm up first?
These are display or non-display cards?
If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once. You could try running nvidia-smi in daemon mode, that is a cuda client that keeps the driver active and prevents it shutting down (and keeps things like compute exclusivity settings too).
If this is a display card, it is a bit harder to explain, unless there is some sort of power manager active which is turning down the memory and shader clocks on inactive cards. It might take a few seconds to pull the speeds back up again. The NVIDIA control panel or settings app should show you the card clocks before a run to confirm this.