CUDA Application Startup Speed on Different Cards

When I compile and test my code on a computer running a GeForce 560ti the software starts up a executes as expected. When I move the generated executable to a new computer with a quadro k4000 the executable takes over 3 minutes to execute the first gpu calls. (I am using CUDA 4.2)

My original assumptions was that this has something to do with forward compatibility and JIT compiling. However, the same thing happens every time I start up the application so if that is the case the JIT compiled code does not seem to be getting cached anywhere. Can someone explain what is going on?

There are many factors that influence start-up delay of a CUDA program. Here are a few:

  1. whether or not a device-specific machine code is available in the fatbinary associated with the application (thus indicating whether a JIT-compile is required)
  2. If a JIT-compile is required, the status of the JIT cache, which will determine whether a JIT compile might be triggered again, in the future
  3. If a JIT-compile is required, the complexity of the code (impacting compile-time).
  4. The size of the system memory (CUDA runtime start-up involves significant memory management work), and other factors that affect memory-management work, such as OS, 32bit/64bit, UVM, UM, etc.
  5. Whether or not an X-server is running, and/or whether or not the GPUs are placed in persistence mode.

UM would obviously not be applicable to your CUDA 4.2 case. But since there are a number of factors involved, and you’ve given almost no information about your machine configuration and config differences, it’s probably not possible to be any more specific. The usual steps to narrow this down would be to remove contributing factors, one-by-one. For example, to remove JIT as a possible contributing factor, be sure to compile your code so that it has the necessary machine code target for the GPU in question. Although it’s unlikely that GPU wakeup itself would be a factor approaching a 3 minute differential, you could place your GPUs into persistence mode (if on Linux) to alleviate GPU wakeup time, at least for test purposes. Many possible suggestions and resolution approaches are specific to details like OS, which you haven’t provided.

Thanks for your help.

Both computers are running windows 7. One has a development environment and the other is a brand new windows build with the latest drivers. You gave me enough information to start testing on my own. I am going to check the JIT cache size and status.