How do I run Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled on vllm community docker?

Performance-wise you’re probably better off doing a quant, etc.; however, to answer your original question, here is a recipe for use with sparkrun that builds on top of @eugr’s vllm docker repo.

sparkrun run @sparkrun-testing/jackrong-qwen3.5-27b-claude4.6-distill-vllm

The @sparkrun-testing/ prefix is required for “hidden” registries. I do that so that I can deploy recipes for particular use without them being part of the default tab completion, etc.

You can check out the recipe file at: sparkrun-recipe-registry/testing/recipes/qwen3.5/exotic/jackrong-qwen3.5-27b-claude4.6-distill-vllm.yaml at main · dbotwinick/sparkrun-recipe-registry · GitHub

I tested that it ran and I was seeing ~4.5 tok/s, so not terribly impressive on performance with single node tensor parallel, but it’s interesting to see this new wave of opus distillation models! (Note that 27B dense model at BF16 would have a theoretical peak throughput of ~5.1 tok/s on a single spark Spark).

You can learn more about how to install sparkrun in the forums at: Sparkrun - central command with tab completion for launching inference on Spark Clusters or check out the docs at https://sparkrun.dev. sparkrun is designed to make it easier to run models and we’re working to make it easier to find recipes and understand baseline performance at spark-arena.com.