Anyone have a solution for LoRA training of recent MoE models like Qwen3.5-35B-A3B or Gemma-4-26B-A4B and successfully running in vLLM?

haidij · April 10, 2026, 8:13pm

Hi - I’m not having much luck getting a LoRA for recent SOTA MoE models to work with vLLM.

I tried this solution: Bf16 LoRA Fine-Tuning of Qwen3.5-35B-A3B on DGX Spark — No Quantization Required but it didn’t load successfully into vLLM apparently due to a mismatch between the Unsloth LoRA format of fused expert tensors. I also tried training on just the attention layers but the loss stayed really high. If you try to use Unsloth Studio directly you run out of memory (at least for Qwen3.5-35B-A3B and Gemma-4-26B-A4B).

If anyone has worked out how to do this without going OOM, and produced a LoRA that actually affects the output of the model, and could run in vLLM, I would love to copy your approach.

thanks

zrjzhuruijie · April 11, 2026, 12:30am

LOOK AT MY GITHUB REPO.

Topic		Replies	Views
Bf16 LoRA Fine-Tuning of Qwen3.5-35B-A3B on DGX Spark — No Quantization Required DGX Spark / GB10 Projects training , ai-model-training	5	603	April 6, 2026
Fine-tuning Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 with QLoRA on DGX Spark DGX Spark / GB10	8	1845	December 1, 2025
NeMo AutoModel → NIM: Export path for Qwen3-VL-30B-A3B MoE after LoRA training DGX Spark / GB10 nim , nemo-framework	0	30	March 19, 2026
Missing vision reasoning with Qwen3.5-122B Q4 on vLLM (works on llama.cpp) DGX Spark / GB10 llama	4	529	March 13, 2026
Tune and Deploy LoRA LLMs with NVIDIA TensorRT-LLM Technical Blog	3	641	April 18, 2024
Jetson AGX Thor + vLLM (26.02): MoE performance significantly below reference — missing fused MoE config? Jetson Thor llama	7	235	April 2, 2026
VLLM -- the $150M train wreck? DGX Spark / GB10 llama	24	1196	February 27, 2026
"vLLM + Gemma 4 on NVIDIA DGX Spark GB10" - has anyone testing this implementation? DGX Spark / GB10	0	239	April 7, 2026
NeMo AutoModel: Full YAML config for Qwen3-VL-30B-A3B LoRA fine-tuning (8xH100, single node) DGX Spark / GB10 nemo-framework	0	28	March 19, 2026
Someone post this: Gemma 4 26B-A4B MoE running at 45-60 tok/s on DGX Spark DGX Spark / GB10	4	1942	April 5, 2026

Anyone have a solution for LoRA training of recent MoE models like Qwen3.5-35B-A3B or Gemma-4-26B-A4B *and* successfully running in vLLM?

Related topics

Anyone have a solution for LoRA training of recent MoE models like Qwen3.5-35B-A3B or Gemma-4-26B-A4B and successfully running in vLLM?