Hey,
I’m using bug report template, so here you have all the information
- Operating System
Windows XP x64 Professional 2003 SP2
- Synopsis description of the problem
CUDA runtime functions fail after loading (with LoadLibrary()) certain number of (different) DLLs that contain nvcc-compiled code (and link to cudart.dll).
- Detailed description of the problem
If one loads a certain number of DLLs that contain nvcc-compiled code (i.e. kernels), there seems to be a per-process limit after which CUDA runtime functions start to fail.
How to reproduce:
Basically it is enough to load a few DLLs that contain a lot (several thousand lines of code) of nvcc code. Size of particular kernel seems not to matter: only combined size of all kernels in all DLLs does (like there was a per-process limit of GPU code). It may be possible to get this bug with a single DLL file which is large enough.
To help investigating the problem, I created a VS 2005 project that contains a DLL file with a kernel and and starter executable (which is not linked to CUDA runtime). Using a shell script (you need sh-like shell for that, e.g. one from Cygwin), you may replicate the dll 500 times (using different names). Starter executable loads all those at once and fails after loading N-th one.
For convenience, a set of 500 already replicated copies of the DLL (they are given different names to make them look different for LoadLibrary() ) + compiled starter application (all 64-bit Windows executables) are provided in separate .rar files (caution: unpacked size is 160+ Mbs).
IMPORTANT NOTE: problem does not appear if you load a single DLL 500 times. You have to replicate it to create 500 different DLLs and load them at once.
- CUDA toolkit release version
Tested 1.1 and 2.0-beta. Fails with both.
- SDK release version
Tested 1.1 and 2.0-beta. Fails with both.
- Compiler for CPU host code
Microsoft ® C/C++ Optimizing Compiler Version 14.00.50727.42 for x64
- System description including:
CPU type, CPU speed, installed system RAM, system type and model, video cards installed in the system, chipset type
Intel Core2 Quad Q6600 2.53Ghz, 4Gb RAM, Intel chipset (I’m sorry I can’t give chipset version right now, but it seems to be irrelevant to the problem anyway), two display adapters:
NVidia GeForce 8800 GT (PCI-Express, used for CUDA, displayless)
NVidia GeForce 7950 GT (PCI-Express, used for display)
CudaSizeLimitTest_ReplicatedDLLs.rar (525 KB)
CudaSizeLimitTest.rar (39.1 KB)