[ runtime initialization very slow as gpu count increases, linux ]

Hi,

We have recently bought a machine equipped with
8 gpus (Tesla c2050). The system runs linux 2.6.32
and I installed cuda 4.0. The cuda library initialization
takes approx. 7 seconds, which is an issue since we run
short tests where this time becomes predominant.

I use the low level cuda interface (ie. cuInit())

I do not know if the initialization time is linear
with the gpu count, but it seems to be.

I recently complained about a similar issue in
Cublas initialisation times, which might be related
to this one (but maybe it is due to autotuning).

Btw, is there anything you can do to help? Is this
a potential issue with Linux, in which case I could
try to solve by myself?

Regards,

Fabien Le Mentec.

I see the same behavior with 8 GeForce GTX 580 in our system.

Context initialization time in seconds (rough estimate):

1 GPU:   4s

2 GPUs:  9s

3 GPUs: 14s

4 GPUs: 18s

5 GPUs: 22s

6 GPUs: 26s

7 GPUs: 27s

8 GPUs: 30s

We are creating one thread for each GPU.

I am interested in comments ans possible solutions to this problem.