Fermi uses 200MB device memory extra

Hi

for a while now I’ve been wondering why fermi level gpus use so much more device memory. Generally in my application they seem to use 200MB on top of what my program allocates (no matter if I use 1.5MB myself or 1000MB). This does not depend on whether I compile for arch=sm_20 or arch=sm_13. With the latter I actually can run the same exectuable on a GTX295 and a GTX470. And again the GTX470 needs 200MB on top of what my program uses.

Anyone knows a reason, and how to avoid that? I mean 200MB is really a significant amount of memory.

The code I am talking about is included in the gpulammps project. I am developing the USER-CUDA package.
http://code.google.com/p/gpulammps/

cheers
Ceearem

It is used for stack, heap, and printf buffer to support the new features on Fermi. You can try reducing the size with cudaThreadSetLimit().