Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling

Originally published at: Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling | NVIDIA Technical Blog

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long-thinking, this technique improves model performance by allocating additional computational resources during inference to evaluate multiple possible outcomes and then selecting the best one, neural…

This is very exciting!

What is the plan for Nvidia to release those AI-generated kernels? For example, are we going to see those kernels being available in the latest TensorRT release? Or we can expect some plugins being released so we can use them to accelerate ML model inference?

Thanks

Hi. Nice idea.
Can you please provide more details of what you have in the verifier and how the prompt is refined. Is it just by appending the feedback from the verifier