Issues deserializing GPU trained model on CPU with cuml-cpu

Hi there,

I am running a rapids 24.04 environment using cuML HDBSCAN to cluster some data. Training the data on my GPU instance works without a hitch. However, I want to now use approximate prediction on a CPU instance using the trained model and am running into a blockage attempting to deserialize the trained model.

My CPU environment has been setup with rapids 24.04 with cuml-cpu=24.04

Documentation is a bit light (cuML on GPU and CPU — cuml 24.04.00 documentation ) but my desired process is as follows…

  1. Train the HDBSCAN model using GPU
  2. Serialize the trained model using pickle
  3. Transfer the pkl to the CPU (cuML-cpu) instance
  4. Deserialize the model on the CPU instance
  5. Run approximate predictions

Everything goes swimmingly until step 4, where I hit a wall and get an error that resembles…

Traceback (most recent call last):
File “…”, line 104, in
clusterer = pickle.load(open(“/…/v33.1712898524.pkl”, “rb”))
File “hdbscan.pyx”, line 1008, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.setstate
AttributeError: ‘NoneType’ object has no attribute ‘sync’

I’ve tried a couple of different clustering algorithms, and although the errors aren’t the same, I keep hitting a wall at the same place. Deserializing the model on the CPU version of cuML is problematic.

I have been struggling to figure out where I am going wrong with my implementation. Perhaps my process is completely wrong, any help pointing me in the right direction would be amazing.



You may get better help using one of the rapids recommended community channels.