compression in Fermi compression Fermi

During my simulation, I need to copy the data back periodically for I/O purpose. However, it seems data copy back (each time about 30MB) from GPU-CPU slow down dramatically the simulation. Is there a good option to avoid this? I’m thinking of compress the data before copying back. So, is there a good compression library working on Fermi GPU to support single- and/or double-precision data available?

Thanks,
Tuan

I would suggest looking into cuda streams and cudaMemcpyAsync() to overlap the copy back to the host with simulation work on the GPU.