Only got 50 TPS on Qwen3.5 35B A3B FP8

so i tried to run Qwen3.5 35B A3B FP8 and expected to got at least 70-80 Token/Sec but i just got 50 Token/Sec

Disclaimer: i know people got same result on Spark Arena - LLM Leaderboard , i just want to correct my way of thinking about expected token throughput

as i read some article, i got that Token/sec is directly influenced by Memory Bandwidth, and Spark got 273GB/s and we need to transfer the model weight directly to the core, which i got the equation of

TPS = Memory Bandwidth(GBps) / Model Weight(GB) per Token

Qwen3.5 35B A3B FP8 is 3B active parameter with 1Byte per parameter so its 3GB Model Weight per Token

so if i use the equation, we should got this result

TPS = 273GBps / 3GB per Token

TPS = 91

so i should got 91 Token/second and i understand that there is a overhead of MoE Routing, KV cache, etc. but the overhead is roughly 44% (50/91*100) is this the normal overhead value or i miss something?

please freely correct me if i was wrong, because i need it to correct my understanding of inferencing LLM

best regards.

thanks, i will follow it up