I’m planning to fine-tune Qwen/Qwen3-VL-30B-A3B-Instruct using NeMo AutoModel on a single-node 8xH100 setup with LoRA (PEFT).
I see from the VLM model coverage table that this model is supported with both FSDP2 and PEFT, with the reference config qwen3_vl_moe_30b_te_deepep.yaml.
However, I can’t find the actual contents of this YAML file in the docs or the GitHub repo examples. Could you share the reference config or point me to where it’s published?
Specifically I need to know:
- LoRA target_modules — I’m targeting
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_projand excluding the MoE router. Is this correct? - Batch size recommendations for 8xH100 80GB with this MoE model (128 experts, top-8)
- Does the
nvcr.io/nvidia/nemo-automodel:26.02container include the MoE LoRA improvements from PR #1300 (merged 2026-02-26)? - DeepEP configuration — is
ep_size: 8the right setting for 8 GPUs?
My setup:
- Hardware: 8x NVIDIA H100 80GB (single node, Thunder Compute)
- Model: Qwen/Qwen3-VL-30B-A3B-Instruct (NOT the FP8 variant)
- Data: 1M text Q&A pairs (instruction/output format, Hebrew + English)
- Method: LoRA, rank 32, FSDP2
- Container:
nvcr.io/nvidia/nemo-automodel:26.02
Thank you!
Vitali Yudilevich
Founder, Allocator (allocator.live)