Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D

cosinus · February 24, 2026, 5:07pm

Seems that Qwen did also something for us (single) Spark users this time:

Interesting sizes that popped up until now (122B-A10B, 35B-A3B, 27B).

Will try my luck with llm-compressor again to get the 122B squeezed into one Spark.

I assume GGUFs will be available shortly by unsloth & Co.

joshua.dale.warner · February 24, 2026, 5:15pm

I expect this to be a very attractive model for single Spark use. Especially at 4 bits, with MTP already set up. With the recent active discussion around ideal 4 bit quants, Autoround vs AWQ vs NVFP4 will be very interesting to explore.

adg1 · February 24, 2026, 5:29pm

Indeed, the GGUF fine-tuned model is already in.

dbsci · February 24, 2026, 5:41pm

I’m starting to test the 35B model for inclusion in sparkrun’s default recipes. Been waiting for the smaller size Qwen3.5 models to drop!

eugr · February 24, 2026, 5:43pm

122B one should be a good replacement for gpt-oss-120b, but need to wait for suitable quants.

cosinus · February 24, 2026, 5:45pm

The 120B is 250 GB. Not sure if my HW can handle this.

Currently testing to push 35B thru the llm-compressor, but:

  File "/data/quant/src/llm-compressor/src/llmcompressor/utils/dev.py", line 15, in <module>
    from transformers.modeling_utils import TORCH_INIT_FUNCTIONS
ImportError: cannot import name 'TORCH_INIT_FUNCTIONS' from 'transformers.modeling_utils' (/data/quant/src/llm-compressor/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py). Did you mean: 'ROPE_INIT_FUNCTIONS'?

Even with the latest transformers (5.3.0.dev0). Funny. Red Hat AI managed to quantize the big beast Qwen/Qwen3.5-397B-A17B to FP8.

EDIT: No TF5 support yet

github.com/vllm-project/llm-compressor

[Bug]: Support for Transformers v5

opened 06:05AM - 26 Jan 26 UTC

closed 04:24PM - 26 Jan 26 UTC

Kaihui-intel

bug

### ⚙️ Your current environment Transformers v5 Compatibility When using transf…ormers v5.0.0rc3, we encounter the following errors. Additionally, the `use_auth_token` parameter has been deprecated in transformers v5. Question: Is there a plan to support transformers v5? What's the timeline? Any guidance would be appreciated. ### 🐛 Describe the bug transformersv5.0.0rc3 ```bash Traceback: ../../../miniforge3/lib/python3.12/importlib/__init__.py:90: in import_module return _bootstrap._gcd_import(name[level:], package, level) test_cpu/integrations/test_llmc_integration.py:4: in <module> from llmcompressor import oneshot ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/__init__.py:23: in <module> from llmcompressor.core.session_functions import ( ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/core/__init__.py:10: in <module> from llmcompressor.core.lifecycle import CompressionLifecycle ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/core/lifecycle.py:14: in <module> from llmcompressor.core.state import State ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/core/state.py:14: in <module> from llmcompressor.metrics import BaseLogger, LoggerManager ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/metrics/__init__.py:12: in <module> from .logger import * ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/metrics/logger.py:24: in <module> from llmcompressor.utils import is_package_available ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/utils/__init__.py:8: in <module> from .dev import * ../../../miniforge3/lib/python3.12/site-packages/llmcompressor/utils/dev.py:14: in <module> from transformers.modeling_utils import TORCH_INIT_FUNCTIONS E ImportError: cannot import name 'TORCH_INIT_FUNCTIONS' from 'transformers.modeling_utils' (/home/sdp/xinhe/transformers/src/transformers/modeling_utils.py) ``` ### 🛠️ Steps to reproduce _No response_

cho · February 24, 2026, 5:59pm

try mxfp6?

eugr · February 24, 2026, 6:02pm

I’d wait for the quants, at least FP8 - we should get a native one from Qwen soon, I believe.

cosinus · February 24, 2026, 6:04pm

I asked the Red Hat AI team what they did with the 397B.

I assume Alibaba’s Qwen team will also provide a FP8 as they did for that beast.

giraudremi92 · February 24, 2026, 6:06pm

I am on my way to test it for spark arena & publish the recipe for the @eugr Spark VLLM container.
Need some adjustment to run apparently. (last transformers indeed)

giraudremi92 · February 24, 2026, 6:28pm

Let’s go ! @raphael.amorim

raphael.amorim · February 24, 2026, 6:29pm

LET’S GOOOOOOOOO 😂

giraudremi92 · February 24, 2026, 6:33pm

Day 0 😂

notmy.reward438 · February 24, 2026, 9:06pm

I can’t wait for NVFP4 quant!

archsenex · February 24, 2026, 9:09pm

Does anybody have any links to guides that will help me to get ready for running this once the quant is available? I had been running qwen3-vl-30b and would love to replace it with this, but I’ll confess, getting 3-vl up and running was a nightmare a few months ago and I had hoped there were simpler options by now (it just seems like every version of everything I install is wrong for the chipset etc).

cosinus · February 24, 2026, 9:21pm

cyanwiki has already the first quant up:

You could try that one:

vllm serve Qwen/Qwen3.5-27B --port 8000  --max-model-len 262144 --reasoning-parser qwen3 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

eugr · February 24, 2026, 9:21pm

You can use our community docker builds: GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks

eugr · February 24, 2026, 9:22pm

Something is wrong about that quant. It’s 30GB for 27B model. It could be that more than half of the weights are activation weights, or some issue during quantization.

archsenex · February 24, 2026, 9:28pm

Thanks, I’ll try out the docker approach and that should help with the stability. I’d imagined things had gotten a bit more mature by now.

noobsly · February 24, 2026, 9:33pm

This is the dense model isn’t it? Not the MoE model. Would love a FP8 or NVFP4 quant of Qwen3.5-35B-A3B

Topic		Replies	Views
Qwen3.5-397B-A17B + DGX Spark (duo) DGX Spark / GB10 Projects	62	6491	June 14, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	12011	April 9, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	434	23712	June 24, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	10666	March 24, 2026
Qwen3.5-397B-A17B run in dual spark! but I have a concern DGX Spark / GB10	236	9798	June 6, 2026
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark DGX Spark / GB10 Projects llm	75	6684	May 4, 2026
Qwen/Qwen3.6-35B-A3B (and FP8) has landed DGX Spark / GB10 agentic-ai	309	29310	June 22, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	6348	March 16, 2026
Fastest Qwen 3.5 122B Int4 recipe on DGX Spark tested and published on Spark-Arena DGX Spark / GB10 llama	59	3779	June 3, 2026
What's the best speed we can get with Qwen 3.6 27B without quantizing? DGX Spark / GB10	64	21540	July 6, 2026

Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D

Related topics