Qwen 3.5 397B int4-AutoRound Weight Conflict on Dual Spark

pfnguyen · March 28, 2026, 6:19pm

I’m hitting a FilesBufferOnDevice exception when loading the Intel AutoRound quant of Qwen 3.5 397B on a dual Spark setup. I know others have this running successfully, so I’m trying to figure out if my local files are fundamentally different or if it’s my loader config.

The specific error is:

`Exception: FilesBufferOnDevice: key model.language_model.layers.59.mlp.experts.213.down_proj.qweight must be unique among files`

A clean download shows a direct conflict between the index and the binary shards:

model.safetensors.index.json maps the key to:

model-00038-of-00040.safetensors

But a grep through the actual shards shows it’s physically duplicated in 39 and 40:

$ grep -ao "model.language_model.layers.59.mlp.experts.213.down_proj.qweight" *.safetensors | cut -d: -f1 | sort | uniq -c

The loader kills the process at iteration 19/21 because the same key is duplicated across files.

For those running this successfully on dual Spark:

1. Can you run the grep above and see if your shards also show this duplication?

2. What is your exact vLLM launch command?

I’m trying to determine if there’s a specific configuration that bypasses this or if I need to re-fetch a specific commit of the shards.

christian176 · March 28, 2026, 8:55pm

I suggest to follow this threat: Qwen3.5-397B-A17B run in dual spark! but I have a concern
or use eugr’s spark-vllm which has a recipe for qwen3.5 397B Autoround.

dbsci · March 28, 2026, 11:57pm

Hi again!

Spark Arena experimental recipe file: recipe-registry/experimental-recipes/eugr-vllm/qwen3.5-397b-a17b-int4-autoround-2x-vllm.yaml at main · spark-arena/recipe-registry · GitHub (to see what others are doing for vllm commands).

That file is intended for use with sparkrun. If you run that recipe with sparkrun, it’ll use the latest version of spark-vllm-docker.

I haven’t run that model in a bit – but I do see the duplication in the index data – I’ll go to redownload and try it.

pfnguyen · March 29, 2026, 7:33am

it seems using –tf instead of –pre-tf, or copying recipe flags got me past that error, but at this point, I’m stuck with ray and taking up memory (gets oom-killed on worker). seems I need to switch over to the recipe format to drop using ray for now and see how that goes instead.

stefan132 · March 29, 2026, 12:18pm

Just in case, because you mentioned ray: start the recipe mit —no-ray

pfnguyen · March 29, 2026, 4:23pm

I’ve tried using –no-ray, I still have a memory error at initial startup saying that 111.xxGB is not enough to satisfy 112GB request. I have next to nothing running on on the sparks and I’ve disabled X11 (switched to multi-user.target) as part of initial setup. Only thing running outside of the system as usual is screen to multiplex my one ssh session.

christian176 · March 29, 2026, 7:33pm

May I ask which GB10 you have? I figured out that ASUS is reserving more memory than e.g. Nvidia Spark or Lenovo.
e.g. Nvidia spark in my setup has 122 GB free memory and util of 2.4 GB without anything in user space. The Asus has 120 GB free memory and memory util of 4.1 GB!
With this bacground I was enforced to lower the KV cache memory and reduze the max context size.
What is your overall free memory and how much memory is used by the nodes?

pfnguyen · March 29, 2026, 7:41pm

Mine are also the GX10; what did you do to unreserve more memory? or what did you kill?

$ free -h
total used free shared buff/cache available
Mem: 119Gi 64Gi 3.6Gi 1.1Gi 54Gi 55Gi
Swap: 0B 0B 0B

(of course, stuff is running atm, so memory is used)

christian176 · March 29, 2026, 7:57pm

unfortunatly I was not able to do much.
mine after a fresh reboot without anything
total used free shared buff/cache available

Mem: 119Gi 3.6Gi 106Gi 160Ki 10Gi 115Gi

You could deinstall a lot of ubuntu stuff like cups, snapd, etc. This helps a little bit. At the end I need to start qwen3.5-397B with 106 GB memory and reduzed context size of 200k
Otherwise I get same error like you.

pfnguyen · March 29, 2026, 8:13pm

neuron:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           119Gi       3.9Gi       114Gi       2.1Mi       1.7Gi       115Gi
Swap:             0B          0B          0B

synapse:~/spark-vllm-compose/worker1$ free -h
               total        used        free      shared  buff/cache   available
Mem:           121Gi       4.2Gi       116Gi       2.1Mi       2.3Gi       117Gi
Swap:             0B          0B          0B

so weird, each of my gx10 report different total values

edit: just ran fwupdmgr update– now both machines say 121G total

dbsci · March 29, 2026, 10:27pm

Guess you noticed (firmware update). But the ~2GB different is based on how much is reserved in UEFI firmware. NVIDIA reduced it from 4GB to 2GB (I think – going from my memory), so anyway, gaining ~2GB hit the founders edition first – and eventually followed through to the partner models. FYI.

Topic		Replies	Views
Qwen3.5-397B-A17B run in dual spark! but I have a concern DGX Spark / GB10	211	5118	April 4, 2026
Qwen3.5-397B-A17B + DGX Spark (duo) DGX Spark / GB10 Projects	52	3920	March 30, 2026
Double memory use in Huggingface Qwen3 coder next DGX Spark / GB10	4	299	March 21, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	43	7197	April 3, 2026
Failed to run Qwen3-235B-A22B-FP4 model on a two spark's cluster DGX Spark / GB10	7	754	October 30, 2025
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	13922	March 24, 2026
Qwen3-235B-A22B-NVFP4 Playbook Example Hangs DGX Spark / GB10	4	451	April 2, 2026
Issues with qwen3 coder next on dual node cluster DGX Spark / GB10	5	250	March 11, 2026
Bf16 LoRA Fine-Tuning of Qwen3.5-35B-A3B on DGX Spark — No Quantization Required DGX Spark / GB10 Projects training , ai-model-training	3	388	April 4, 2026
Qwen 3.5 SLM on DGX GB10 DGX Spark / GB10 Projects spark , dgx	12	456	March 3, 2026

Qwen 3.5 397B int4-AutoRound Weight Conflict on Dual Spark

Related topics