Data Collator for FastPitch and HiFi-GAN Fine-Tuning in NeMo

Hello everyone,

I am currently fine-tuning the FastPitch and HiFi-GAN models using the NeMo toolkit on a custom Dataset on system with NVIDIA GeForce RTX 3090. Despite reducing the batch size, adjusting gradient accumulation, and applying other optimizations, I am still encountering a “CUDA out of memory” error.

I am considering using a Data Collator to manage memory more efficiently, but I am unable to find relevant options in NeMo. Could anyone guide me on how to implement or customize a Data Collator within NeMo? Alternatively, are there any other strategies within NeMo that I can use to overcome this memory issue?

Thank you for your assistance!

Best regards,
Hasan Maqsood

Hi @hasanmaqsood8747 ,
Can you help us with more details of which model/version are you using?
the document you are refering to?