qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.
full story: [https://forums.developer.nvidia.com/t/why-you-should-rip-it-yourself-live-moe-expert-pruning-in-vllm/]
qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.
full story: [https://forums.developer.nvidia.com/t/why-you-should-rip-it-yourself-live-moe-expert-pruning-in-vllm/]
