Hello,
I was testing DFlash on a Blackwell 6000 Pro to evaluate how speculative decoding affects output generation. With DFlash speculative decoding enabled, generation throughput reached ~429.69 tokens/s, which is approximately, 4.8× higher throughput than the ~90.20 tokens/s observed without speculative decoding.
With speculative decoding enabled
export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--speculative-algorithm DFLASH \
--speculative-draft-model-path z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat \
--tp-size 2 \
--dtype bfloat16 \
--attention-backend flashinfer \
--mem-fraction-static 0.75 \
--trust-remote-code
[2026-02-07 14:38:41] Decode batch, #running-req: 1, #token: 492, token usage: 0.00, accept len: 6.10, accept rate: 0.61, cuda graph: True, gen throughput (token/s): 429.69, #queue-req: 0,
Output
Quantum computing is a revolutionary technology that uses the principles of quantum mechanics to perform calculations and operations on data. Here are some key aspects:
**Key Features:**
1. **Quantum Bits (Qubits):** Qubits are the fundamental units of quantum information, unlike classical bits which can only be 0 or 1. Qubits can exist in multiple states simultaneously, known as a superposition.
2. **Entanglement:** Qubits can become "entangled," meaning their properties are connected, even when separated by large distances.
3. **Quantum Gates:** Quantum gates are the quantum equivalent of logic gates in classical computing. They perform operations on qubits, such as rotations and entanglement.
4. **Superposition:** Qubits can exist in multiple states simultaneously, allowing for parallel processing of vast amounts of data.
**Advantages:**
1. **Exponential Scaling:** Quantum computers can solve certain problems exponentially faster than classical computers.
2. **Simulation:** Quantum computers can simulate complex systems, such as molecules and materials, which can lead to breakthroughs in fields like chemistry and materials science.
3. **Cryptography:** Quantum computers can break certain classical encryption algorithms, but they can also be used to create unbreakable quantum encryption.
**Challenges:**
1. **Error Correction:** Quantum computers are prone to errors due to the fragile nature of qubits.
2. **Scalability:** Currently, quantum computers are small-scale and need to be scaled up to perform practical calculations.
3. **Noise:** Quantum computers are sensitive to noise, which can cause errors and affect the accuracy of calculations.
**Applications:**
1. **Optimization:** Quantum computers can be used to optimize complex systems, such as logistics and finance.
2. **Machine Learning:** Quantum computers can be used to speed up machine learning algorithms and improve their accuracy.
3. **Cryptography:** Quantum computers can be used to create unbreakable encryption and secure communication.
**Current Status:**
1. **Research:** Quantum computing is an active area of research, with many companies and organizations working on developing quantum computers.
2. **Prototypes:** Several quantum computer prototypes have been built, including IBM's Quantum Experience and Google's Bristlecone.
3. **Commercialization:** Quantum computing is expected to become more commercialized in the coming years, with companies like IBM, Google, and Microsoft leading the charge.
Overall, quantum computing has the potential to revolutionize many fields and solve complex problems that are currently unsolvable with classical computers. However, significant technical challenges need to be overcome before quantum computing can become a practical reality.
Without speculative decoding
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--tp-size 1 \
--dtype bfloat16 \
--attention-backend flashinfer \
--mem-fraction-static 0.75 \
--trust-remote-code
[2026-02-07 14:42:55] Decode batch, #running-req: 1, #token: 568, token usage: 0.00, cuda graph: True, gen throughput (token/s): 90.20, #queue-req: 0,
Output:
Quantum computing is a new and rapidly evolving field of computer science that uses the principles of quantum mechanics to perform calculations and operations on data. Here are some key aspects:
**Key Principles:**
1. **Qubits**: Quantum computers use quantum bits or qubits, which can exist in multiple states (0, 1, or both) simultaneously, unlike classical bits which can only be 0 or 1.
2. **Superposition**: Qubits can exist in a state of superposition, meaning they can process multiple possibilities simultaneously.
3. **Entanglement**: Qubits can become entangled, allowing them to be connected and correlated in a way that enables faster processing of complex calculations.
**Advantages:**
1. **Exponential scaling**: Quantum computers can process exponentially more information than classical computers for certain problems.
2. **Faster simulations**: Quantum computers can simulate complex systems, such as molecules and materials, much faster than classical computers.
3. **Cryptography**: Quantum computers can break certain classical encryption algorithms, but they can also be used to create unbreakable quantum encryption.
**Challenges:**
1. **Error correction**: Quantum computers are prone to errors due to the fragile nature of qubits.
2. **Scalability**: Currently, quantum computers are small-scale and need to be scaled up to be practical.
3. **Control and calibration**: Maintaining control over qubits and calibrating the quantum computer is a significant challenge.
**Applications:**
1. **Optimization**: Quantum computers can optimize complex systems, such as logistics and finance.
2. **Materials science**: Quantum computers can simulate the behavior of materials, leading to breakthroughs in fields like chemistry and materials science.
3. **Cryptography**: Quantum computers can create unbreakable encryption, which is essential for secure communication.
**Companies and Research Institutions:**
1. **IBM Quantum**: A leading provider of quantum computing services and hardware.
2. **Google Quantum AI Lab**: A research initiative focused on developing quantum computing applications.
3. **Microsoft Quantum**: A research and development effort focused on quantum computing and its applications.
**Timeline:**
1. **1980s**: Quantum computing concepts were first proposed.
2. **1990s**: First quantum computers were built.
3. **2000s**: Quantum computing research accelerated.
4. **2010s**: First commercial quantum computing services were launched.
5. **2020s**: Quantum computing is becoming more accessible and practical.
This is a brief overview of quantum computing. If you have specific questions or want more information, feel free to ask!