Device Functions Memory needs of a device function

I’m currently designing the code of my first cuda application.Although I’ve read so much stuff about CUDA i still have one question(untill now).
I’m wondering what part of the memory (shared, global,constant or else) does a device function (not a kernel function)use in order to proceed.This is something someone
should know during the design process in order to achieve memory coalescing.
Thanks in advance!Any help would be welcome!