cuda initialization takes too much time

How much time, exactly? Is this a system with multiple GPUs or large system memory? A Windows or a Linux system? If the latter: Is the CUDA driver in persistence mode? Other than calling cudaGetDeviceCount(), what does magma_init() do?