@tenari I worked with claude to fix the bug… here is the details if you want to integrate it in the code base:
Bug: MTP layers silently skipped when config omits num_nextn_predict_layers
Some Qwen3.5/3.6 finetunes (e.g. Jackrong/Qwopus3.6-35B-A3B-v1) retain the mtp.* weights
from the base checkpoint but do not carry the num_nextn_predict_layers / num_mtp_layers
field in their config.json. This causes build_extended_shard_regexes to derive
n_mtp_config = 0, and the existing guard:
n_mtp = min(n_mtp_config, n_mtp_actual) if n_mtp_actual > 0 else 0
silently resolves to min(0, 1) = 0, dropping all MTP shards from the probe/cost schedule.
The probe still captures MTP Fisher stats (because incremental_probe has a separate detection
path), but the cost pickle ends up with zero mtp.* entries. The allocator therefore produces
a layer_config.json with no mtp.* assignments, and the export fails at the
validate_mtp_assignment_coverage guard with:
RuntimeError: source checkpoint contains mtp.* weights but the allocator recipe contains
no mtp.* entries. Re-run the incremental probe + cost with --include-mtp (the default)
so mtp.* tensors are measured, then rerun allocator/export.
Fix
In incremental_probe.py, build_extended_shard_regexes: treat the three cases explicitly
instead of using a single min():
| n_mtp_config | n_mtp_actual | Old behaviour | New behaviour |
|—|—|—|—|
| 0 | > 0 | min(0, N) = 0 — MTP silently dropped | Trust weights; schedule n_mtp_actual shards + log notice |
| > 0 | 0 | Skip (correct) | Unchanged |
| > 0 | > 0 | min(config, actual) | Unchanged |
No changes to incremental_measure_quant_cost.py or the export — the fix is entirely in the
shared shard-schedule function that both probe and cost consume.