During my simulation, I need to copy the data back periodically for I/O purpose. However, it seems data copy back (each time about 30MB) from GPU-CPU slow down dramatically the simulation. Is there a good option to avoid this? I’m thinking of compress the data before copying back. So, is there a good compression library working on Fermi GPU to support single- and/or double-precision data available?