CUDA memory error calculating shap values although enough memory

I am trying to calculate SHAP Values from a previously trained Random Forest. I am getting the following error:

MemoryError: std::bad_alloc: CUDA error at: /opt/anaconda3/envs/rapids-21.12/include/rmm/mr/device/cuda_memory_resource.hpp

The Code I am using is

import pickle
from cuml.explainer import KernelExplainer
import cupy as cp

filename = 'cuml_random_forest_model.sav'
cuml_model = pickle.load(open(filename, 'rb'))
arr_cupy_X_test = cp.load("arr_cupy_X_test.npy")

cu_explainer = KernelExplainer(model=cuml_model.predict,
cu_shap_values = cu_explainer.shap_values(arr_cupy_X_test)

I am using gpu_usage() and torch.cuda.empty_cache() to clear gpu memory. I have diminished the size of the test array arr_cupy_X_test down to 100, but still receiving the error.

Is there maybe another issue with the cuml kernel explainer?

Any suggestions welcome.

Note, the issue is now also addressed here: [BUG] Aggressive memory usage with KernelExplainer on wide datasets leads to IllegalMemoryAccess · Issue #4604 · rapidsai/cuml · GitHub

