NeMo AutoModel: Full YAML config for Qwen3-VL-30B-A3B LoRA fine-tuning (8xH100, single node)

I’m planning to fine-tune Qwen/Qwen3-VL-30B-A3B-Instruct using NeMo AutoModel on a single-node 8xH100 setup with LoRA (PEFT).

I see from the VLM model coverage table that this model is supported with both FSDP2 and PEFT, with the reference config qwen3_vl_moe_30b_te_deepep.yaml.

However, I can’t find the actual contents of this YAML file in the docs or the GitHub repo examples. Could you share the reference config or point me to where it’s published?

Specifically I need to know:

  1. LoRA target_modules — I’m targeting q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj and excluding the MoE router. Is this correct?
  2. Batch size recommendations for 8xH100 80GB with this MoE model (128 experts, top-8)
  3. Does the nvcr.io/nvidia/nemo-automodel:26.02 container include the MoE LoRA improvements from PR #1300 (merged 2026-02-26)?
  4. DeepEP configuration — is ep_size: 8 the right setting for 8 GPUs?

My setup:

  • Hardware: 8x NVIDIA H100 80GB (single node, Thunder Compute)
  • Model: Qwen/Qwen3-VL-30B-A3B-Instruct (NOT the FP8 variant)
  • Data: 1M text Q&A pairs (instruction/output format, Hebrew + English)
  • Method: LoRA, rank 32, FSDP2
  • Container: nvcr.io/nvidia/nemo-automodel:26.02

Thank you!

Vitali Yudilevich
Founder, Allocator (allocator.live)