Very different limit to pinned memory with SLI-enabled vs SLI-disabled??

Dear All … interesting problem …

I’m using VS2015, 64Gb system RAM, 64 bit compiler/toolchain, CUDA 8, 3 x Maxwell Titan X, Windows 10 64bit … and if I have SLI enabled, I find I can allocate up to approx. 32Gb of pinned system memory (which is about right, as I believe Windows has limit of approx. half of total ram on this). This pinned memory seems to work well (I use it to transfer data from the 3 GPUs into the CPU memory at the same time with async memcopy).

However, if I disable SLI (which I have been told is the way CUDA should be run), then I find that it is out of memory after only allocating 10Gb of pinned memory. I make no changes to the code, and I don’t need to recompile, so it is isolated to SLI vs no SLI difference. I have tried different ways of trying to increase the Windows pinned memory limit without SLI, but can’t seem to change it.

Is there a reason why SLI enabled would allow so much more pinned memory? And is there a way of doing it without enabling SLI?

Thank you all …

The GPUs are apparently in WDDM mode, try switching to TCC mode.

Thank you and that might work, but I really need them in WDDM mode so I can use them for display as well. I’m just wondering why, in WDDM mode, with SLI enabled, I can allocate so much pinned memory but hardly much in SLI disabled? I would think actually it should be the opposite. Surely having SLI disabled should work much better …

Side note: I would question the wisdom of pinning GBs of system memory, usually one would not want to constrain operating systems so severely, they are designed around virtualized memory.

The CUDA facilities for allocating pinned system memory are just thin wrappers around the operating system facilities, and in my experience, those OS facilities are a bit of a black box in terms of the amount of memory that can be pinned. It seems to depend on all kinds of system state, so I am wondering whether you are really observing a cause-effect relationship with the SLI settings or just an artifact.

Are the observations repeatable (1) After a fresh boot (2) After a day of heavy use ? If so, consider filing an enhancement request with NVIDIA. While I am not sure that something could actually be done about the issue from the NVIDIA side, it may depend a bit on which flags the NVIDIA driver uses when it invokes the operating system’s pinning facilities.

yes I think you’re right on your points about the pinned memory limit and SLI. However, it is not dependent at all on state of the system. It seems pretty consistent that SLI-disable only has 10Gb and SLI-enabled has 32Gb available. It might be something to do with SLI only being “one” card in terms of virtual memory to Windows, while no SLI is 3 separate cards … which perhaps means it uses more pinned memory to track all three (although the memory management system tells me no extra visible memory is being used, so it shouldn’t be …).

Also, there’s 64Gb system memory in total, so taking 32Gb as pinned memory really is no problem with Windows 10 (it can happily run very smoothly on 16Gb of memory). In fact, I would like to allocate more pinned! But 32Gb total pinned is fine for 3 cards with 12Gb onboard each.

may be of interest:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#sli-interoperability

exactly … read this with interest, thank you.

basically this says that using SLI enabled should be suboptimal in terms of device memory, and perhaps best not to use it. I have found that device memory is allocated across the 3 devices in SLI-enabled absolutely fine (all the kernels run independently withint he GPUs and each GPU memory seems very distinct). The async memcpy from all the GPUs to the CPU memory is very fast and happens concurrently. I find the SLI-enabled allows me to pin a lot more CPU memory. Windows of course still runs fine given it has the other 32Gb free memory available to it.

It seems really crazy that Windows doesn’t allow more than 10Gb of pinned memory out of 64Gb if SLI is disabled, but somehow, having the more complicated state of SLI enabled, it allows up to 32Gb with no problems at all.