Hi there,
Does anybody know how I can train just last 1,2, or 3 layers of BERT using Jarvis functionalities?
I want to experiment with performance of BERT on my local text corpus through domain adaptation, but don’t want to fine-tune the entire architecture.