kernel invocation memory usage


I have two quick questions on memory use by cuda. I have a Program called VideoMemory which monitors the amount of memory used on the graphics card. This program tells me that everytime a cuda enabled program is started about 30 MB are used and they are not freed until the program is closed (remain allocated after a kernel call). What is this memory used for or is the Program VideoMemory not functioning properly?
It also seems that in my cuda program the memory use goes up drastically (about 100 MB) while the kernel is being executed. This is without the memory allocated using cudaMalloc. How does a kernel use global memory apart from starting data allocated by cudaMalloc?

Thanks in advance.

PS: I apologize for the many questions


I can imagine that CUDA reserves a block of memory once initialized for administrative stuff, constant memory and a cache for the actual program text.

Have you compared the displayed usage with what cuMemGetInfo reports?


Thats a good idea ill try that.

What about the kernel memory usage. I seem to be getting about twice the usage when I call the kernel. Does it make some form of copies. Cant realy check that with cuMemGetInfo since its gone after the kernel invocation.

Thanks for the quick reply.