Been jamming away on OpenCode connected to Qwen3.5-122b on dual Sparks. It’s unbelievably good. I am not a great coder, mostly because I am too slow and methodical. The quality of code I’m generating with this stack was unimaginable to me a year ago.
Currently I’m in the same headspace here. Although still battling with the orchestration & tooling (sandboxing, separation etc.). So not very productive for real world tasks yet, but the potential is there and seems to be truly amazing.
Yeah, you are probably approaching it with a bit more formality than me. I’m actually doing the dumb thing of building out features as quickly as possible and seeing if I can then get that into proper software architecture for product. Why now?
Fully agree. Got my Spark last week, managed to quite easily (thanks to this community’s help) get qwen 3.5 122b int4 AutoRound running one my single node with rougly 25 GB to spare. Speeds are more than adequate for my opencode usage and so far also happy with the quality.
i am with you. we started on nv3090ti and added a dgx sparc and looking forward on 2. node.
testing with friends and family. just exploring the space. we also see the benchmark issue. created and tested real world support tasks with tool calling as a internal benchmark and so on. we tested the work done here and are thankful for the contibutors. we need a llm fight arena as a benchmark. imagine adding your v1 endpoint to fight other models in a task and after the task choose a leader for the 1on1. a mixture of speed and difficulty to decide which one (model) is the one rn.
What are your current working settings for Opencode with Intel/Qwen3.5-122B-A10B-int4-AutoRound? Is the system prompt fix for tool calling with Qwen models still needed? I have it all working pretty stable with Qwen3-Coder-Next-FP8. Today I decided to try out Qwen3.5 instead, but it seems to have problem to use tools properly, even a simple web search in OpenWebUI does not work properly… So I’m curious what is working for you the best in terms of config. Thanks for any hints in advance.
Ok… done my search and yes, the system prompt fix is still needed, with it Opencode works pretty well again.
Sorry, didn’t see your question over the weekend. Glad you got it working.
From the qwen3.5 122b topic, this is what I use:
./launch-cluster.sh -t vllm-node-tf5 --apply-mod mods/fix-qwen3.5-autoround -e VLLM_MARLIN_USE_ATOMIC_ADD=1 --solo exec vllm serve Intel/Qwen3.5-122B-A10B-int4-AutoRound --max-model-len auto --gpu-memory-utilization 0.7 --port 8888 --host 0.0.0.0 --load-format fastsafetensors --enable-prefix-caching --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 --max-num-batched-tokens 8192 --trust-remote-code
As you can see, and as you already said, it uses the fix.
I am naturally using eugr’s spark-vllm-docker.
If the use case is purely text generation (i.e. no need for image and video), you can also reduce memory footprint by adding --language-model-only (skips vision encoder and multi-modal profiling) Ref: Qwen/Qwen3.5-122B-A10B Model Card > vLLM > Text-Only.
I’ve tried the --language-model-only flag with Intel/Qwen3.5-122B-A10B-int4-AutoRound quant, and indeed for text-only it leaves more RAM for larger KV cache and other stuff.