Why 200 tok/s is new normal? — TP=2 Does Scale After All

qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.

full story: [https://forums.developer.nvidia.com/t/why-you-should-rip-it-yourself-live-moe-expert-pruning-in-vllm/]

Tried to read you post, not allowed :( very interested in reap + sft to create own models by reducing unneeded parts and adding project focused ones

Really, not allowed?

Same here, yesterday was accessible.