Anyone have a solution for LoRA training of recent MoE models like Qwen3.5-35B-A3B or Gemma-4-26B-A4B *and* successfully running in vLLM?

Hi - I’m not having much luck getting a LoRA for recent SOTA MoE models to work with vLLM.

I tried this solution: Bf16 LoRA Fine-Tuning of Qwen3.5-35B-A3B on DGX Spark — No Quantization Required but it didn’t load successfully into vLLM apparently due to a mismatch between the Unsloth LoRA format of fused expert tensors. I also tried training on just the attention layers but the loss stayed really high. If you try to use Unsloth Studio directly you run out of memory (at least for Qwen3.5-35B-A3B and Gemma-4-26B-A4B).

If anyone has worked out how to do this without going OOM, and produced a LoRA that actually affects the output of the model, and could run in vLLM, I would love to copy your approach.

thanks

1 Like

LOOK AT MY GITHUB REPO.