Issues deserializing GPU trained model on CPU with cuml-cpu

Hi there,

I am running a rapids 24.04 environment using cuML HDBSCAN to cluster some data. Training the data on my GPU instance works without a hitch. However, I want to now use approximate prediction on a CPU instance using the trained model and am running into a blockage attempting to deserialize the trained model.

My CPU environment has been setup with rapids 24.04 with cuml-cpu=24.04

Documentation is a bit light (cuML on GPU and CPU — cuml 24.04.00 documentation ) but my desired process is as follows…

  1. Train the HDBSCAN model using GPU
  2. Serialize the trained model using pickle
  3. Transfer the pkl to the CPU (cuML-cpu) instance
  4. Deserialize the model on the CPU instance
  5. Run approximate predictions

Everything goes swimmingly until step 4, where I hit a wall and get an error that resembles…

Traceback (most recent call last):
File “…”, line 104, in
clusterer = pickle.load(open(“/…/v33.1712898524.pkl”, “rb”))
File “hdbscan.pyx”, line 1008, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.setstate
AttributeError: ‘NoneType’ object has no attribute ‘sync’

I’ve tried a couple of different clustering algorithms, and although the errors aren’t the same, I keep hitting a wall at the same place. Deserializing the model on the CPU version of cuML is problematic.

I have been struggling to figure out where I am going wrong with my implementation. Perhaps my process is completely wrong, any help pointing me in the right direction would be amazing.

Thanks,

T

You may get better help using one of the rapids recommended community channels.