Error in using multiple GPUs

Hi there,

I'm trying to write a program which can use 4 Tesla cards on a GPU cluster, but I get an error message saying "cudaSafeCall() Runtime API error : setting the device when a process is active is not allowed." when I'm trying to set the devices.
The code is showed below:

......
cudaStream_t *streams = new cudaStream_t[deviceCount]; 

for ( int i = 0; i < deviceCount; i++ ){
cutilSafeCall( cudaSetDevice( i ) ); 
cutilSafeCall( cudaStreamCreate( &streams[i] ) );
}
......

The error message came from the line "cutilSafeCall( cudaSetDevice( i ) ); "

I checked the Simple Multi-GPU example in the SDK, but I didn't notice any difference in doing the same task.
Anyone can help me with this problem?

BTW, I'm using CUDA 4.0 with VS 2008 64 bit version.

Best,
Feng