Delay in cuInit

Hi!

I have observed some performance issues with “cold” cuInit invocations:

If you run X, a call to cuInit returns practically immediately.

If you don’t run X, a call to cuInit returns after approximately 1 s, i.e. with a severe delay.

If you don’t run X, but has a program like the following running in the background:

#include <cuda.h>

#include <unistd.h>

int main()

{

   cuInit(0);

   while (1) {

       sleep(1);

   }

   return 0;

}

then a call to cuInit in my foreground application returns immediately.

Hence, there seems to be some system global initialization state that takes seconds to enter. Are there nicer solutions than to run a daemon like the one above in the background to avoid the cuInit delay when not running X?

Hi!

I have observed some performance issues with “cold” cuInit invocations:

If you run X, a call to cuInit returns practically immediately.

If you don’t run X, a call to cuInit returns after approximately 1 s, i.e. with a severe delay.

If you don’t run X, but has a program like the following running in the background:

#include <cuda.h>

#include <unistd.h>

int main()

{

   cuInit(0);

   while (1) {

       sleep(1);

   }

   return 0;

}

then a call to cuInit in my foreground application returns immediately.

Hence, there seems to be some system global initialization state that takes seconds to enter. Are there nicer solutions than to run a daemon like the one above in the background to avoid the cuInit delay when not running X?

The driver will unload a lot of state when there is no client connected to it, and it is that the re-establishment of that state which takes the time you are seeing. The recommended solution is to run nvidia-smi in daemon mode with a time cycle of a few seconds. It also has the benefit of forcing the driver to retain stuff like compute exclusivity settings during extended idle times.

The driver will unload a lot of state when there is no client connected to it, and it is that the re-establishment of that state which takes the time you are seeing. The recommended solution is to run nvidia-smi in daemon mode with a time cycle of a few seconds. It also has the benefit of forcing the driver to retain stuff like compute exclusivity settings during extended idle times.