Qwen 3.5 397B int4-AutoRound Weight Conflict on Dual Spark

I’m hitting a FilesBufferOnDevice exception when loading the Intel AutoRound quant of Qwen 3.5 397B on a dual Spark setup. I know others have this running successfully, so I’m trying to figure out if my local files are fundamentally different or if it’s my loader config.

The specific error is:

`Exception: FilesBufferOnDevice: key model.language_model.layers.59.mlp.experts.213.down_proj.qweight must be unique among files`

A clean download shows a direct conflict between the index and the binary shards:

model.safetensors.index.json maps the key to:

model-00038-of-00040.safetensors

But a grep through the actual shards shows it’s physically duplicated in 39 and 40:

$ grep -ao "model.language_model.layers.59.mlp.experts.213.down_proj.qweight" *.safetensors | cut -d: -f1 | sort | uniq -c

The loader kills the process at iteration 19/21 because the same key is duplicated across files.

For those running this successfully on dual Spark:

1. Can you run the grep above and see if your shards also show this duplication?

2. What is your exact vLLM launch command?

I’m trying to determine if there’s a specific configuration that bypasses this or if I need to re-fetch a specific commit of the shards.

I suggest to follow this threat: Qwen3.5-397B-A17B run in dual spark! but I have a concern
or use eugr’s spark-vllm which has a recipe for qwen3.5 397B Autoround.

Hi again!

Spark Arena experimental recipe file: recipe-registry/experimental-recipes/eugr-vllm/qwen3.5-397b-a17b-int4-autoround-2x-vllm.yaml at main · spark-arena/recipe-registry · GitHub (to see what others are doing for vllm commands).

That file is intended for use with sparkrun. If you run that recipe with sparkrun, it’ll use the latest version of spark-vllm-docker.

I haven’t run that model in a bit – but I do see the duplication in the index data – I’ll go to redownload and try it.

it seems using –tf instead of –pre-tf, or copying recipe flags got me past that error, but at this point, I’m stuck with ray and taking up memory (gets oom-killed on worker). seems I need to switch over to the recipe format to drop using ray for now and see how that goes instead.

Just in case, because you mentioned ray: start the recipe mit —no-ray

I’ve tried using –no-ray, I still have a memory error at initial startup saying that 111.xxGB is not enough to satisfy 112GB request. I have next to nothing running on on the sparks and I’ve disabled X11 (switched to multi-user.target) as part of initial setup. Only thing running outside of the system as usual is screen to multiplex my one ssh session.

May I ask which GB10 you have? I figured out that ASUS is reserving more memory than e.g. Nvidia Spark or Lenovo.
e.g. Nvidia spark in my setup has 122 GB free memory and util of 2.4 GB without anything in user space. The Asus has 120 GB free memory and memory util of 4.1 GB!
With this bacground I was enforced to lower the KV cache memory and reduze the max context size.
What is your overall free memory and how much memory is used by the nodes?

Mine are also the GX10; what did you do to unreserve more memory? or what did you kill?

$ free -h
total used free shared buff/cache available
Mem: 119Gi 64Gi 3.6Gi 1.1Gi 54Gi 55Gi
Swap: 0B 0B 0B

(of course, stuff is running atm, so memory is used)

unfortunatly I was not able to do much.
mine after a fresh reboot without anything
total used free shared buff/cache available

Mem: 119Gi 3.6Gi 106Gi 160Ki 10Gi 115Gi

You could deinstall a lot of ubuntu stuff like cups, snapd, etc. This helps a little bit. At the end I need to start qwen3.5-397B with 106 GB memory and reduzed context size of 200k
Otherwise I get same error like you.

neuron:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           119Gi       3.9Gi       114Gi       2.1Mi       1.7Gi       115Gi
Swap:             0B          0B          0B
synapse:~/spark-vllm-compose/worker1$ free -h
               total        used        free      shared  buff/cache   available
Mem:           121Gi       4.2Gi       116Gi       2.1Mi       2.3Gi       117Gi
Swap:             0B          0B          0B

so weird, each of my gx10 report different total values

edit: just ran fwupdmgr update– now both machines say 121G total

1 Like

Guess you noticed (firmware update). But the ~2GB different is based on how much is reserved in UEFI firmware. NVIDIA reduced it from 4GB to 2GB (I think – going from my memory), so anyway, gaining ~2GB hit the founders edition first – and eventually followed through to the partner models. FYI.