Faster performance version of DeepSeek V4 Pro / FLash just dropped.
Faster performance version of DeepSeek V4 Pro / FLash just dropped.
Do we need to enable speculative config in vllm for this or is it built-in to the model?
The weights incorporate the drafter but new support will be needed to inference engines to actually make use of them