multiple modules

If I have multiple independent modules executed sequentially, what is the best way to set up CUDA. I would assume I should load and unload each module prior and after use everytime, since doesnt the module reside in global memory? But this seems wrong to me. Any insights.
Basically my applications are called multiple times, each time each independent program is executed.