As some of you may recall (most of you won’t), I’ve been working on this rather extensive project that will eventually (hopefully) be able to use and manage many NVidia GPUs, using a propietary algorithm that took me a little more than 2 years to develop.
The good news is that I’ve finished the coding for this project. Yay!
So now I’m at the debugging stage, and I’ve encountered what can only be described as a rather nasty surprise. Bit of a nightmare, actually.
For reasons that I won’t go into here, I’m using the NVidia Driver API to communicate to any and all of the NVidia GPU(s) in the system, which so far has been working just great.
The first time the program calls the NVidia Driver API function, cuMemHostAlloc(), it call it requesting 415,248,384 bytes of mapped, pinned Host memory. That call works, and I get the memory.
The second time the program calls cuMemHostAlloc() however, it call it requesting 1,660,993,536 bytes of mapped, pinned Host memory.
That’s when the unexpected happens. The function call never returns. It just reboots my entire system!
In fact, if I break up the call to cuMemHostAlloc() into four separate calls of 415,248,384 bytes each (4 X 415,248,384 = 1,660,993,536), one of the calls (not the first) will always reboot my system, whether I’m running under the debugger or not! I know at this point that I should be able to tell you which one of the calls reboots the system, but hey, I’m still in shock that any call to an NVidia Driver API function will reboot my system! Besides, I can only discern which call blows up the OS if I run the program under the debugger, and the debugger, as someone pointed out in another memory-related thread, might be aggravating the problem, so that wouldn’t tell me much anyway…
So now for some of the details. The program is a 32-bit program compiled using Visual C++ 2010 Express, and running under a 64-bit Windows 7 OS. And for the record, it’s compiled with the ‘Enable C++ Exceptions’ set to ‘No’, but since the NVidia Driver API is a ‘C’ language interface, that shouldn’t matter…(right?)
The hardware is a Dell Inspiron N7110 Laptop with eight (8) gigabytes of installed RAM, an Intel 2.4 GHz Dual-Core CPU (four logical processors), and a single NVidia GeForce GT 525M GPU with 1,073,545,216 bytes of Adapter RAM, and two (2) streaming multiprocessors (96 CUDA Cores, I think).
The program is being linked with the /LARGEADDRESSAWARE linker flag, and the Windows 7 Boot Loader’s ‘IncreaseUserVa’ setting, as reported by bcdedit, is properly set to 3072, so the program should be able to allocate up to three (3) gigabytes of Windows memory.
When the latest tests were run, the Windows ‘System Information’ utility reported that there was 5.90 Gigabytes of ‘Available Physical Memory’, so the memory was definitely there.
Also, more for the record than anything else, the first call to the cuMemHostAlloc() function, which works fine, uses the following flags:
CU_MEMHOSTALLOC_PORTABLE | CU_MEMHOSTALLOC_DEVICEMAP | CU_MEMHOSTALLOC_WRITECOMBINED
The second call, and/or the next four calls, only use these:
CU_MEMHOSTALLOC_PORTABLE | CU_MEMHOSTALLOC_DEVICEMAP
More grist for the mill: On my system, the Windows’ ‘Startup and Recovery’ option for ‘System failure’ is set to ‘Automatically restart’, so that’s probably what’s happening. But why is the cuMemHostAlloc() function causing a system failure??
Lastly, the version of the NVida Driver that the program links to is 307.21 (‘File’ version: 220.127.116.111).
So can anyone help? Any ideas? Anything?