Is there any way to suspend the kernel?

Hey,

I am new to CUDA and am programming an application with a long-run kernel. By any chance, I would like to suspend the kernel to restart it later. Is there anybody know how I can do that?

Thank you in advance.

Not that I am aware of, no. There is no graceful way to stop a running kernel from the host and no way to save or restore the state necessary to resume a kernel either. It would be nice if it were possible, though.

Not that I am aware of, no. There is no graceful way to stop a running kernel from the host and no way to save or restore the state necessary to resume a kernel either. It would be nice if it were possible, though.

I guess you could program something like that using a flag in mapped host memory that is checked by each newly started block.

You would have to do all the data saving and restarting logic yourself, though.

I guess you could program something like that using a flag in mapped host memory that is checked by each newly started block.

You would have to do all the data saving and restarting logic yourself, though.

Wow, glad I found this post - I have been searching for a similar functionality.

I have a CUDA app, (with a long running kernel), and I want a way to suspend / resume the app. Similar to how BOINC can stop/start.

Looking at the Developer API I didn’t get far - it looks like the only way to make this possible would be to look at/ work with the CUDA source code to add the functionality.

Is that a correct assumption?

Thanks, tera. I will try that.

SingularMatrix, thank you for the advice. That was what I would like to do. But the CUDA seems to be a closed source. :(

Given that the “source” that manages running kernel threads is baked into the silicon on the hardware, you would not have much luck modifying it even if it were open. The best you can do is shorten your kernel launches and provide a save/restore mechanism on the host in-between kernel launches.

Yeah, there’s nothing fancy done by those apps. They complete a kernel, checkpoint on the CPU side, and quit.