Applications that have high cpu/gpu communication

I’m looking for applications that have high cpu/gpu communication ratio.
Those applications should either be applications that the data transfer between the cpu and the gpu takes more than 30% of the total execution time or even better,hybrid applications that the cpu does a certain part of the job and the gpu part of the computations.
So far i’ve found that there are a lot of applications with high data transfer overhead but very few (if any) hybrid applications.
Do you know any application that i could use for benchmarking purpuses in any of the two categories mentioned above?
I’ve already tested parboil benchmark suite and rodinia benchmarks.
Have a look at some of the dense matrix factorization functions in the UTK Magma library. Those include hybridization of different parts of the Lapack block factorization algorithms, including a lot of data exchange between host and device through the lifespan of one operation.

MAGMA is a great example. btw, the CPU-GPU communication gap can be reduced by using overlapped-copy-kernel-execution feature. Not sure how many apps actually take advantage of it…(That could really reduce the overhead if properly used and tuned)

All of the CUDA accelerations in VMD work in hybrid.
The app I develop, HOOMD-blue, is the exact opposite of what you are looking for because the GPU does 99.999…% of the work and gpu->cpu communication is only performed when needed for disk I/O :)

Great i will also look into it,thanks a lot.