Strange all cuda-capable devices are busy or unavailable error

I have a system running a gtx580 and two tesla c2070s on windows 7 x64. Past versions of the code have been able to run fine with one instance on each of the teslas. However I have started getting a strange error.

The 1st instance will run fine.

The 2nd instance (does not matter which card is run first) will give a “all cuda-capable devices are busy or unavailable” for the first two cudamalloc commands, then run fine. If put in a check to keep trying the cudamalloc until it works, then it fails twice, and works the 3rd time and the rest of the code runs fine.

Both teslas return cudaComputeModeDefault when queried.

Any help would be appreciated.

Could you paste a sample code which could reproduce it?

Well it gets even weirder. I suspect it something wrong with the card, because it runs fine on another system with two tesla cards. I suppose I can pull out the gtx580 and see if that is causing some weird problem.

Basically its just this:

device=0;
xsize=512;
ysize=512
nosts=75;
float device * enh, device *gapmh, device *ekmh;

init_cuda(device);//sets which card

//this allocates on device memorry.
error=cudaGetLastError();
if (error!=0) cout