Inference on finetuned MegaMolBart model fails "Unexpected key(s) in state_dict" provided model is fine

simon.collings · July 10, 2024, 6:15am

Hi,
When I run finetuning on a MegaMolBart model using the example code and finetuning in the SAMPL csv file it seems to run well and generate a new .nemo file. My issue is when I try to use that generated .nemo file for inference and it fails with ‘RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel:’ - that same code works well for the downloadable megamolbart.nemo file. Does anyone have any ideas or any Inference code/config they could share to help me solve this one?

When I have run it previsouly on the downloadable megamolbart.nemo model I get this:
[NeMo I 2024-07-08 21:39:30 regex_tokenizer:240] Loading vocabulary from file = /tmp/tmp7bu1mkb2/36b36f49c3e64962a7b54f1a1ba2b580_megamolbart.vocab
[NeMo I 2024-07-08 21:39:30 regex_tokenizer:254] Loading regex from file = /tmp/tmp7bu1mkb2/111b90cc2819425382967ab999101096_megamolbart.model
[NeMo I 2024-07-08 21:39:30 megatron_base_model:315] Padded vocab_size: 640, original vocab_size: 523, dummy tokens: 117.
[NeMo I 2024-07-08 21:39:30 nlp_overrides:752] Model MegaMolBARTModel was successfully restored from /workspace/bionemo/models/molecule/megamolbart/megamolbart.nemo.
Loaded a <class ‘bionemo.model.molecule.megamolbart.infer.MegaMolBARTInference’>
hidden_states.shape=torch.Size([2, 45, 512])
pad_masks.shape=torch.Size([2, 45])
embeddings.shape=torch.Size([2, 512])
embedding.shape=torch.Size([2, 512])
[NeMo I 2024-07-08 21:39:32 megatron_lm_encoder_decoder_model:1195] Decoding using the greedy-search method…

When I try to use my finetrained model I get this:
[NeMo I 2024-07-08 21:35:14 regex_tokenizer:240] Loading vocabulary from file = /workspace/bionemo/tokenizers/molecule/megamolbart/vocab/megamolbart.vocab
[NeMo I 2024-07-08 21:35:14 regex_tokenizer:254] Loading regex from file = /workspace/bionemo/tokenizers/molecule/megamolbart/vocab/megamolbart.model
[NeMo I 2024-07-08 21:35:14 megatron_base_model:315] Padded vocab_size: 640, original vocab_size: 523, dummy tokens: 117.
[NeMo W 2024-07-08 21:35:14 megatron_lm_encoder_decoder_model:240] Could not find encoder or decoder in config. This is probably because of restoring an old checkpoint. Copying shared model configs to encoder and decoder configs.
[NeMo W 2024-07-08 21:35:14 megatron_lm_encoder_decoder_model:206] bias_gelu_fusion is deprecated. Please use bias_activation_fusion instead.
[NeMo W 2024-07-08 21:35:14 megatron_lm_encoder_decoder_model:206] bias_gelu_fusion is deprecated. Please use bias_activation_fusion instead.
Traceback (most recent call last):
File “/workspace/bionemo/examples/molecule/megamolbart/mycompany_infer.py”, line 57, in
inferer = load_model_for_inference(cfg, interactive=True)
File “/workspace/bionemo/bionemo/triton/utils.py”, line 166, in load_model_for_inference
model = infer_class(cfg, interactive=interactive)
File “/workspace/bionemo/bionemo/model/molecule/infer.py”, line 39, in init
super().init(
File “/workspace/bionemo/bionemo/model/core/infer.py”, line 458, in init
super().init(
File “/workspace/bionemo/bionemo/model/core/infer.py”, line 137, in init
self.model = self.load_model(cfg, model=model, restore_path=restore_path)
File “/workspace/bionemo/bionemo/model/core/infer.py”, line 203, in load_model
model = restore_model(
File “/workspace/bionemo/bionemo/model/utils.py”, line 361, in restore_model
model = model_cls.restore_from(
File “/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py”, line 465, in restore_from
return super().restore_from(
File “/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py”, line 442, in restore_from
instance = cls._save_restore_connector.restore_from(
File “/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/parts/nlp_overrides.py”, line 751, in restore_from
super().load_instance_with_state_dict(instance, state_dict, strict)
File “/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py”, line 203, in load_instance_with_state_dict
instance.load_state_dict(state_dict, strict=strict)
File “/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py”, line 447, in load_state_dict
results = super(NLPModel, self).load_state_dict(state_dict, strict=strict)
File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 2152, in load_state_dict
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
RuntimeError: Error(s) in loading state_dict for MegaMolBARTModel:
Missing key(s) in state_dict: “enc_dec_model.encoder_embedding.word_embeddings.weight”, “enc_dec_model.encoder_embedding.position_embeddings.weight”, “enc_dec_model.decoder_embedding.word_embeddings.weight”, “enc_dec_model.decoder_embedding.position_embeddings.weight”, “enc_dec_model.enc_dec_model.encoder.model.layers.0.input_layernorm.weight”, “enc_dec_model.enc_dec_model.encoder.model.layers.0.input_layernorm.bias”, “enc_dec_model.enc_dec_model.encoder.model.layers.0.self_attention.query_key_value.weight”, “enc_dec_model.enc_dec_model.encoder.model.layers.0.self_attention.query_key_value.bias”,

The key parts of the code are:
from bionemo.triton.utils import load_model_for_inference
from bionemo.model.molecule.megamolbart.infer import MegaMolBARTInference
…
print(f"Loading model")
inferer = load_model_for_inference(cfg, interactive=True)
print(f"Loaded a {type(inferer)}")

simon.collings · July 11, 2024, 5:48am

I’ve also put this on the NEMO github page https://github.com/NVIDIA/NeMo/issues/9685 (linking for visibility)