Hello, I have a custom server that I built with 3 Tesla K40’s for some high-intensity simulation processing. The system blue screen’s anytime I have the full 1TB of RAM installed, I have determined this to be the fault of the K40’s driver.
The problem is that the system is a quad-channel system. If I remove a stick of RAM, and bring the system down to 960G of RAM, the system performance degrades to a single-channel configuration, which hinders performance noticeably (about 20%). I can bring the system down to 512G of RAM and everything runs optimally, but then I can’t run the larger simulations that need the 1TB of memory space, which is what I built the system for, and 20% doesn’t seem like much but in this instance, it is a measure of days, and some of these simulations are time-critical.
I was wondering if there was an environment variable (or something like it) within the K40 that I could manually set to only give it access to 960G of the system RAM? That way I could have the full 1TB plugged into the system, and the motherboard/processors would still operate under the quad-channel configuration and the system wouldn’t blue screen due to the NVidia driver limitation.
Any assistance would be appreciated, thanks in advance!