Graduate course project ideas

I’m looking for a CUDA project to do for a graduate course in parallel computing. Ideally, the project should take 4 weeks to complete and result in a publishable paper and/or source code. I’m thinking along the lines on an SDK-type project that hasn’t been completed yet.

Any Ideas?

You could consider implementing a fast split. The scan + scatter approach in the scan paper is too inefficient. I suspect scattering alone is slower than an efficient approach (e.g. with coalesced write and stuff).