Originally published at: Real-Time Decoding, Algorithmic GPU Decoders, and AI Inference Enhancements in NVIDIA CUDA-Q QEC | NVIDIA Technical Blog
Real-time decoding is crucial to fault-tolerant quantum computers. By enabling decoders to operate with low latency concurrently with a quantum processing unit (QPU), we can apply corrections to the device within the coherence time. This prevents errors from accumulating, which reduces the value of results received. We can do this online, with a real quantum…
From the plot it says iterations/sec. Is that BP iterations (e.g. related to the max_iterations arguments passed to the decoder) or is decoding iterations with each iteration being a syndrome being decoded?
Those are BP iterations, related to the max_iterations, not full syndrome decode cycles. Also, note that the RelayBP algorithm can be configured to do multiple legs with each leg having up to max_iterations iterations.
The reason we provide BP iteration per second rather than full decodes per second is because the number of BP iterations required for each syndrome is highly dependent on the exact syndrome, so any timing for full decode cycles is only valid for a very specific error rate under a very specific noise model, but the number of BP iterations per second is relatively constant (at least for a given PCM size), and is hence a more reproducible timing metric.
Hope this helps!
Makes sense. It’s easy to get back the actual iteration counts from the decoder using the "num_iter” additional option.
Was the experiment a circuit level noise simulation or something simpler, like code capacity where DEMs were not used?
@ae_pascal It just occurred to me that these timings were collected with a release candidate for our 0.6 release, which has yet to be released. (We originally planned on releasing in January but had to hold back for reasons unrelated to the decoder.) I think we need to update the blog to state this clearly. The 0.5.0 version of this decoder is slower. Apologies for the inconvenience and potential confusion on this.
How much of a performance difference is there between 0.5.0 and 0.6? We’re evaluating the framework on different codes.
