Determine CUDA Context Memory Usage

Is there a way to determine how much GPU memory creating a context will require?

I have a multi-process system where I need to load balance/limit memory across GPUs for 10s of processes.
I can determine how much memory “I think” CUDA will use by pre-auditing my buffers with known sizes. I realize this might not be a 1-1 relationship depending on how the GPU allocates memory but it appeared to be close.

I just moved my code from a GeForce GT 640 to a TITAN X (Pascal) card and what I thought would consume around 30 mb is taking roughly 181 mb where as on the GT 640 it was around 50 mb. Looks like just instantiating a CUDA context takes about 149 mb on the TITAN X. Is there a way to pre-determine this amount? Will it vary across different cards?

Thanks!