Qwen introduces FlashQLA - high-performance linear attention kernels built on TileLang

djordjestojanovic1992 · April 29, 2026, 2:21pm

Hello people,
have you seen:

Learn more:

📖 Blog: https://qwen.ai/blog?id=flashqla

💻 Code: https://github.com/QwenLM/FlashQLA

Apperently it increases speed vs Flashinfer by 2x.
Has anabody already tested it with eugrs vllm spark container or will this even be relevant for us?
Thanks.

norman.2 · April 29, 2026, 2:45pm

Cc @Albond maybe something to look into? :)

djordjestojanovic1992 · April 29, 2026, 2:54pm

@eugr maybe interesting for you as well? :D

mclenithan · April 29, 2026, 2:56pm

Requirements: SM90, CUDA 12.8+, PyTorch 2.8+.

GB10’s are SM121, sorry y’all.

norman.2 · April 29, 2026, 2:57pm

But we also run sm89 marlin kernels for some stuff so. Is this a hard requirement :D

jwarner · April 29, 2026, 3:23pm

Requirements: “SM90 or above” - it isn’t specific to SM90. In fact, there’s good reason to believe this will be helpful to us; Hopper era optimizations for FP8 deliver basically identical scaling on GB10.

eugr · April 29, 2026, 5:18pm

Looks interesting, I’ll definitely check it out when I have more time for this.

djordjestojanovic1992 · April 29, 2026, 5:44pm

I wonder how big your todo-list is :D Every other thread I see you mentioned with a request to check something out. Take care of yourself.

eugr · April 29, 2026, 6:06pm

lol, pretty big. I’m currently wrapping up some unrelated projects, so can’t be as active with the project and forums as I would like to, but I will have more time in the coming weeks :)

Albond · April 29, 2026, 6:31pm

Thanks, bookmarked it. Unfortunately my DGX Spark is tied up with a fine-tuning run for the next week, so I’ll take a proper look once it’s free. I’ll try to review it on my Mac in the meantime, though I’m not sure that’ll be enough to tell whether it’s worth integrating into the DGX Spark setup.

mclenithan · April 30, 2026, 7:07am

the repo tells a different story than the blog, good to know

mclenithan · May 2, 2026, 8:23pm

Topic		Replies	Views
FlashQLA DGX Spark / GB10 Projects	23	1090	May 19, 2026
Achieving 2x Speed on NVIDIA Spark? This FlashQLA repo caught my eye! DGX Spark / GB10 gaming	2	463	May 2, 2026
DFlash LLM for DGX Spark - too good to be true? DGX Spark / GB10	37	3743	April 17, 2026
Step-3.7-Flash on single Spark (llama.cpp only) DGX Spark / GB10 Projects llama	17	2003	June 23, 2026
Running Step-3.5-Flash on Single Spark DGX Spark / GB10 Projects jetson , llama	20	3271	February 9, 2026
Step-3.5-Flash on Single Spark with 256k context DGX Spark / GB10 Projects llama	2	867	March 3, 2026
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding DGX Spark / GB10 Projects	38	2532	June 24, 2026
Benchmark Report: unsloth/Qwen3.6-35B-A3B-NVFP4-Fast vs nvidia/Qwen3.6-35B-A3B-NVFP4 DGX Spark / GB10 Projects	5	961	July 20, 2026
DGX Spark: 13 → 49 tok/s with Qwen3.5-35B — Native SM121 Kernel Build Guide DGX Spark / GB10 Projects cuda , cusparse	13	1491	April 1, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	6488	March 16, 2026

Qwen introduces FlashQLA - high-performance linear attention kernels built on TileLang

Related topics