Hi,
I’m working on model interpretability (specifically visualizing the attention flow) for Megatron models like biomegatron345m_biovocab_30k_cased and biomegatron-bert-345m-cased. Has anyone worked on this topic before and can offer me some advice?
Also, does anyone know where I can find the model scripts for the BioMegatron models?
Thank you!