Originally published at: An Introduction to Model Merging for LLMs | NVIDIA Technical Blog
One challenge organizations face when customizing large language models (LLMs) is the need to run multiple experiments, which produces only one useful model. While the cost of experimentation is typically low, and the results well worth the effort, this experimentation process does involve “wasted” resources, such as compute assets spent without their product being utilized,…
The ModelSoup sounds similar to the Federated learning’s FedAvg method, where the models are averaged from different models from multiple participants. But we usually also add weight to the model to indicating the contributing factor for the contribution.
Curious any of these merging techniques can apply two different models from different modalities, such as one is from text and another from image, to create a multi-modal model ?
At this time you can do “FrankenMerges” between different architectures - but I haven’t seen any exploration around merging modalities!
It might be interesting to look into!