cuDevicePrimaryCtxRetain returns CUDA_ERROR_OUT_OF_MEMORY

TrailingStop · June 7, 2023, 10:26am

Hi,

I have a system with 12 RTX 3060 / 12GB available memory.

When I call cuDevicePrimaryCtxRetain for them I get for the last 2 GPUs a CUDA_ERROR_OUT_OF_MEMORY error. The Host have 4GB RAM. Is there a way to reduce the memory required for the context?

Thanks,
Daniel

njuffa · June 7, 2023, 3:34pm

Is it really 4 GB? In my experience, at present a system could get by with 8 GB of system memory for light desktop duty, 32 GB for a workstation class environment with some heavy computational lifting (possibly involving GPUs). How much memory is available to user processes after booting the operating system? What is the operating system?

According to conventional wisdom, a well-balanced GPU-accelerated system should have more system memory than total GPU memory combined, ideally 2x to 4x this amount depending on use case. What is the use case envisioned for this 12 GPU system?

TrailingStop · June 8, 2023, 5:29am

Hi, yes it is 4GB on Windows 11. I develop a trading system for fast moving markets as futures and forex. With 8GB it seems to work fine. I was just wondering if there are options to have some impact of the memory consumption of the cuda/driver interface.

Thanks - will go with 8GB.

njuffa · June 8, 2023, 5:50am

I would look at it from the opportunity cost perspective: The cost difference between equipping the system with 8 GB instead of 4 GB is minimal to non-existent (when viewed across multiple DRAM suppliers), and the value of any time spent trying to find to squeeze the CUDA software stack for 12 GPUs into the smaller system memory is likely to exceed that cost.

To my knowledge, there are no software configuration knobs that reduce the memory requirements of the CUDA software stack. These requirements are quite modest and easily met by any reasonably configured system. I will note that I would not consider a system with 8 GB of system memory and 12 GPUs as such.

Under-provisioning of system memory is a fairly common performance-reducing issue in GPU accelerated systems. 12 GPUs can chew through a lot of data, and that data needs to be shuffled in and out of the GPUs via system memory (given that you use consumer parts under Windows, I take it as a given that you are not using GPUdirect), even if it just serves as a buffer for data coming in from disk and network.

Now, conventional wisdom and common observations may not apply to a specific (and possibly very unusual) use case. But system memory size vs performance of this CUDA-accelerated application is something you might want to keep an eye on.

Topic		Replies	Views
Limit tortoise-tts to less than 2GB memory? CUDA Programming and Performance	12	305	August 3, 2024
Host memory use of retained primary context CUDA Programming and Performance	2	654	June 12, 2021
CUDA out of memory CUDA Programming and Performance cuda , deep-learning	1	1011	July 8, 2021
too little memory for seriuos computations too little memory for seriuos computations on cuda suppor CUDA Programming and Performance	5	3235	June 11, 2011
show sizes of GPU memory usage, eg log cudaMalloc, CUDA reports "out of memory" at runtime CUDA Programming and Performance	4	2141	December 13, 2016
why GPU memory size is small? CUDA Programming and Performance	9	9915	July 14, 2010
out of memory CUDA Programming and Performance	11	16487	April 13, 2009
Consumption of host memory increases abnormally CUDA Programming and Performance	5	5554	June 2, 2011
Determine Memory CUDA Context Memory Usage CUDA Programming and Performance	16	10601	March 9, 2019
how much memory will be consumed by cuCtxCreate CUDA Programming and Performance	1	4192	September 27, 2010

cuDevicePrimaryCtxRetain returns CUDA_ERROR_OUT_OF_MEMORY

Related topics