about multi-GPUs initialization

Hi all,

I have a problem for the multi-GPU application, my card is Tesla S1070. I found that the initialization time for multi-GPUs is extremely long. For the simpleMultiGPU program, it takes nearly 4 sec with 4 devices, but if I just set 1 device, it takes only about 1 sec. Is it normal?? my host machine has eight cores.

The second question is that for the simpleMutiGPU style programs, it looks like I need to initialize all devices again if I call the GPU processing threads multiple times. Is it possible that I only need to initialize all devices once, just like the single GPU programming?

Thanks,
Mian