Cudf spill-on-demand couldn't find memory device to spill: <SpillManager device_memory_limit= | spilled | unspilled>

pham_hung2 · May 7, 2025, 2:30pm

Hello,

I ran into this:

[WARNING] RMM allocation of 2GiB failed, spill-on-demand couldn’t find any memory device to spill: <SpillManager device_memory_limit=372.53GiB | 763.09MiB spilled | 0B unspilled (unspillable)>

which occurred at this line of code:

cudf_df = cudf.DataFrame(data_dict)

where data_dict is:

data_dict = {
‘row_id’: <list of 15 Mil row ids>,
‘text’: <list of 15 Mil text records
}

which resulted in this error:

MemoryError: std:: bad_alloc : out_of_memory: CUDA error at …/lib//python3.10/site-packages/librmm/include/rmm/rm/device/cuda_memory_resource.hpp:62: cudaErrorMemoryAllocation out of memory

I have configured a LocalCUDACluster and enabled spill-on-demand:

LocalCUDACluster(
CUDA_VISIBLE_DEVICES=‘0,1,2,3,4,5,6,7’,
rmm_pool_size=0.9,
enable_cudf_spill=True,
device_memory_limit=400000000000,
cudf_spill_statistics=2,
local_directory=‘./cudf_spill_storage’,
log_spilling=False
)

What I notice in my log is the “RMM allocation failed ..” above came from function ‘_out_of_memory_handle’ which gives up after trying twice to spill via memory devices (buffers).

I also notice that SpillManager created only 3 buffers. Why only 3 buffers when there are plenty of memory available on the host? Is this number of 3 buffers a default setting somewhere? (I tried reading the source of SpillManager and SpillableBufferOwner classes but have not found it yet so any pointers are appreciated, thanks in advance).

I also tried disable spilling by setting ‘device_memory_limit=0’ (because of ample memory on host) but got exact same error. It seems to me the limit of only 3 buffers is the start of the failure regardless of setting of ‘device_memory_limit’.

Am I missing something? Any advices/pointers are greatly appreciated. Thank you.

sophwats · May 8, 2025, 11:10am

Hi @pham_hung2,

Are you seeing this error whilst trying to use a NeMo Microservice? are you just using rapids/cudf?

Thanks,

Sophie

pham_hung2 · May 8, 2025, 5:54pm

Hi Sophie,

Thanks for replying. I got that error while using rapids/cudf, NOT the microservice. My objective was loading a 15Mil records dataset using cudf and then prep it using Nemo Curator.

Thanks for your help with this.
Hung

sophwats · May 9, 2025, 2:50pm

Hi Hung,

I’ve chatted to the RAPIDS/cuDF team - please can you ask your question in the RAPIDS Slack - they can support you there.

Best,

Sophie

Topic		Replies	Views
How to solve memory allocation problem in cuda?? CUDA Programming and Performance	4	30881	February 2, 2015
bug in memory allocation? CUDA Programming and Performance	6	4157	May 24, 2012
Newer Drivers fail when allocating Memory Chunks of 2MB + 1 byte on multiple devices CUDA Programming and Performance driver	2	818	July 7, 2021
Cuda allocate device memory failed CUDA Programming and Performance	0	1329	January 31, 2019
Device memory allocation failed, Example 2 CUBLAS First program tried after installation CUDA Programming and Performance	10	16145	August 25, 2009
Maximum memory allocation size CUDA Programming and Performance	7	16676	January 24, 2012
Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager Technical Blog	9	942	March 27, 2021
cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	0	1741	April 18, 2019
cudaHostAlloc can only allocate about 3.5GB of memory out of 128GB CUDA Programming and Performance	7	449	June 2, 2023
cudaMallocHost with large memory failed with invalid argument CUDA NVCC Compiler	3	120	May 6, 2025

Cudf spill-on-demand couldn't find memory device to spill: <SpillManager device_memory_limit= | spilled | unspilled>

Related topics