How to understand the memory footprint of cuda context?

The usual way to request extensions to CUDA (e.g. new functions/APIs) is to file a bug. You can provide a link in the bug to this forum post if you think it is important. The better justification you can give, the more likely it is for the request to receive some priority. Things that can already be done using an alternate method may receive lower priority.