Hello,
I ran into this:
[WARNING] RMM allocation of 2GiB failed, spill-on-demand couldn’t find any memory device to spill: <SpillManager device_memory_limit=372.53GiB | 763.09MiB spilled | 0B unspilled (unspillable)>
which occurred at this line of code:
cudf_df = cudf.DataFrame(data_dict)
where data_dict is:
data_dict = {
‘row_id’: <list of 15 Mil row ids>,
‘text’: <list of 15 Mil text records
}
which resulted in this error:
MemoryError: std:: bad_alloc : out_of_memory: CUDA error at …/lib//python3.10/site-packages/librmm/include/rmm/rm/device/cuda_memory_resource.hpp:62: cudaErrorMemoryAllocation out of memory
I have configured a LocalCUDACluster and enabled spill-on-demand:
LocalCUDACluster(
CUDA_VISIBLE_DEVICES=‘0,1,2,3,4,5,6,7’,
rmm_pool_size=0.9,
enable_cudf_spill=True,
device_memory_limit=400000000000,
cudf_spill_statistics=2,
local_directory=‘./cudf_spill_storage’,
log_spilling=False
)
What I notice in my log is the “RMM allocation failed ..” above came from function ‘_out_of_memory_handle’ which gives up after trying twice to spill via memory devices (buffers).
I also notice that SpillManager created only 3 buffers. Why only 3 buffers when there are plenty of memory available on the host? Is this number of 3 buffers a default setting somewhere? (I tried reading the source of SpillManager and SpillableBufferOwner classes but have not found it yet so any pointers are appreciated, thanks in advance).
I also tried disable spilling by setting ‘device_memory_limit=0’ (because of ample memory on host) but got exact same error. It seems to me the limit of only 3 buffers is the start of the failure regardless of setting of ‘device_memory_limit’.
Am I missing something? Any advices/pointers are greatly appreciated. Thank you.