qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.
full story: [https://forums.developer.nvidia.com/t/why-you-should-rip-it-yourself-live-moe-expert-pruning-in-vllm/]
qwen 3.5 122b int4 autoround after 1h devops and code tree analysis. 10% never used.
full story: [https://forums.developer.nvidia.com/t/why-you-should-rip-it-yourself-live-moe-expert-pruning-in-vllm/]
Tried to read you post, not allowed :( very interested in reap + sft to create own models by reducing unneeded parts and adding project focused ones
Really, not allowed?
Same here, yesterday was accessible.