I recently added the ability to checkpoint CUDA applications and replay individual kernels through Ocelot. Rather than using this for reliability, the intention is to facilitate the automatic creation of CUDA benchmarks and regression tests.
I invite anyone who has an application that they wouldn’t mind contributing to checkout Ocelot from here: Google Code Archive - Long-term storage for Google Code Project Hosting.
Build and install the trunk, link your application against Ocelot, setup a configure.ocelot file to enable trace capture, run your application, and post the resulting traces to this thread. Ideally you should use the most recent version of NVCC targeting the highest shader model (NVCC 4.0, -arch sm_23). Capture and replay should work on any Ocelot device (emulated, llvm, nvidia, or ati).
An example configure.ocelot checkpoint section is as follows:
checkpoint: {
enabled: true,
path: "../../tests/traces/ptx2.3/basic/",
prefix: "UnstructuredMandelbrot_",
suffix: ".checkpoint"
}
These traces create a snapshot of memory before and after kernel execution so they can become extremely large. Fortunately they are also typically very amenable to compression, so please compress your trace before posting. I have a 900MB trace that compressed down to 64KB using bzip2. If you have something huge that you still want to contribute feel free to host it yourself and post a link to it.
Periodically I will consolidate the traces posted to this thread into regression test/benchmark suites and post them on the Ocelot website. Anyone is free to download and use them.
I’ll start out by posting a trace from a SQL inner-join benchmark. Join Trace
Some Caveats:
-
Embedded pointers within global memory are not currently supported. If you do this, your application will produce a trace that will silently fail during execution. I don’t plan on adding support for this in the near future.
-
The trace format captures the memory state before and after kernel execution as well as the PTX assembly code for each launched kernel. If you don’t want to release your source code, but are comfortable releasing a binary and a checkpoint of your memory state, this may be appealing to you.
EDIT: Updated to note that the trace capture branch is now merged with the trunk.