Total GPU code size limit per process? CUDA fails upon loading too many code


I’m using bug report template, so here you have all the information

- Operating System

Windows XP x64 Professional 2003 SP2

- Synopsis description of the problem

CUDA runtime functions fail after loading (with LoadLibrary()) certain number of (different) DLLs that contain nvcc-compiled code (and link to cudart.dll).

- Detailed description of the problem

If one loads a certain number of DLLs that contain nvcc-compiled code (i.e. kernels), there seems to be a per-process limit after which CUDA runtime functions start to fail.

How to reproduce:
Basically it is enough to load a few DLLs that contain a lot (several thousand lines of code) of nvcc code. Size of particular kernel seems not to matter: only combined size of all kernels in all DLLs does (like there was a per-process limit of GPU code). It may be possible to get this bug with a single DLL file which is large enough.

To help investigating the problem, I created a VS 2005 project that contains a DLL file with a kernel and and starter executable (which is not linked to CUDA runtime). Using a shell script (you need sh-like shell for that, e.g. one from Cygwin), you may replicate the dll 500 times (using different names). Starter executable loads all those at once and fails after loading N-th one.

For convenience, a set of 500 already replicated copies of the DLL (they are given different names to make them look different for LoadLibrary() ) + compiled starter application (all 64-bit Windows executables) are provided in separate .rar files (caution: unpacked size is 160+ Mbs).

IMPORTANT NOTE: problem does not appear if you load a single DLL 500 times. You have to replicate it to create 500 different DLLs and load them at once.

- CUDA toolkit release version

Tested 1.1 and 2.0-beta. Fails with both.

- SDK release version

Tested 1.1 and 2.0-beta. Fails with both.

- Compiler for CPU host code

Microsoft ® C/C++ Optimizing Compiler Version 14.00.50727.42 for x64

- System description including:
CPU type, CPU speed, installed system RAM, system type and model, video cards installed in the system, chipset type

Intel Core2 Quad Q6600 2.53Ghz, 4Gb RAM, Intel chipset (I’m sorry I can’t give chipset version right now, but it seems to be irrelevant to the problem anyway), two display adapters:

NVidia GeForce 8800 GT (PCI-Express, used for CUDA, displayless)
NVidia GeForce 7950 GT (PCI-Express, used for display)
CudaSizeLimitTest_ReplicatedDLLs.rar (525 KB)
CudaSizeLimitTest.rar (39.1 KB)