Looking for compute bound Cuda examples

I thought I’d throw this out to the community – I’m doing research on concurrent kernels in Cuda, and I’m trying to find some examples (SDK, benchmark, etc.) of applications with compute bound kernels. The tests I’ve done have shown that the NVIDIA Cuda SDK examples are memory bound. Does anyone know if there are any compute bound examples? Thanks!

p.s. I should note that I can code up a compute bound example myself, but I’m trying to find examples of code that might actually be used for something in the real world. Cheers!

nbody is compute bound - if not compute bound enough for you with default N, the computations increase as O(N^2).