Recently I’ve started working on object/instance segmentation problems using ConvNNs (like FCN-8s and CRF-RNN). They do a great job, and I use their outputs extensively, but there’s always a problem with memory. We currently run Tesla K40m with 12 Gb memory. It’s more than enough for text tasks, but for images I ran into a number of issues like memory and overheating.
For example, inference (aka testing) of 1 image size 1280x720 (~1M pixels, just what I need) on GPU takes only a fraction of a second, that’s great (the ConvNN has ~140M parameters, in total ~537Mb). For training though it immediately reports a memory issue, even for a batch size 1. Eventually I had to reduce the image size to 500x500 (250K pixels, which is still OK but far worse than 1280x720) to do any training at all (batch size of 1).
So here’s my question: why does this happen? It can’t be derivatives, because if there’s 1 derivative for each weight, the total array should be ~1Gb (see above). So why then? How is 1 image w/~1M pixels loaded into memory and how does it get processed by CUDA? Is there any way to predict how much memory I need if I know the image size/number?