Why does loading the CUDA libraries sometimes take so long?


usually dynamically loading the CUDA DLLs takes only a few seconds in my application. However, sometimes it takes more than 30 seconds until the LoadLibrary call returns.

What could be a possible explanation for this behavior?

Did I put this into the wrong topic? Or does nobody have any clue?

It sometimes happens that questions that provide close to zero information received no replies because forum participants are not inclined to (1) speculate widly (2) spend their time trying to get actionable information from the asker to progress beyond wild speculation.

(1) Which particular “CUDA DLLs” are affected? Are these very large ones? Is this a controlled experiment, i.e. the same version of the same library on the identical hardware platform, under the same system load?
(2) Which version of CUDA are we talking about?
(3) What kind of hardware platform is this? Actual hardware? A virtualized system from a cloud provider? What kind of mass storage system? Hard disk, SSD, network storage?

If its the same version of the same library on the same platform, I would guess that load time for a DLL depends on load on the mass storage system and overall system load, in that order. I would recommend a PCIe4 NVMe SSD for fast access, e.g. Samsung 990 PRO. Dynamic linking also requires work by the CPU so try on a system with high single-thread CPU performance (>= 3.5 GHz base frequency).

I am really sorry for having posted a question without providing the necessary information. Let me try to provide it now:
(1) The delay occurs when trying to load the CUDA runtime library (cudart64_110.dll). The behavior is observed using the same version of the libraries and on the same system. Sometimes loading is fast, sometimes it isn’t. The system load is always comparable.
(2) I am using CUDA 11.1.1.
(3) This was observed on an i7-4770S running Windows 10 x64 on an SSD with 16GB RAM. I am pretty sure that it was also observed on an i7-7700 with Windows 10 x64 on SSD with 16GB RAM.

Thank you for pointing out what influences the load time for a DLL. I was just wondering what could cause these delays as they don’t occur everytime.

30 seconds seems like a very long time for a LoadLibrary()call given that the system platform should have decent performance. I do not know what exactly Windows does under the hood for LoadLibrary(), you may want to inquire on a forum for Windows developers. A system-level profiler might reveal where the delay occurs. I don’t know what Windows 10 offers in this regard, is dtrace available?

My working hypothesis was that the delay is due to some sort of congestion either in the file system or in the network in case of network attaches storage. But if the system is generally lightly loaded, I do not have a ready explanation. I wonder whether some sort of malware detection system could cause this kind of massive impact onLoadLibrary() performance. Or maybe a badly structured DLL search path?

Another hypothesis may be that the measurement framework is flawed causing tie to be misattributed. Maybe multiple DLLs are loaded with LoadLibrary() and there is a different (larger) DLL that loads more slowly. Or there is a LoadLibrary(), GetProcAddress() sequence where time is incorrectly attributed between these to API calls.

You may want to run some dedicated tests to see whether there is any pattern to slow LoadLibrary() calls, such as every Nth call is slow, or the slow calls are always the first call in the app, in which case one might suspect some sort of cold-cache effect.