Excessive GPU memory consumption of "Tensorflow.keras.metrics.Metric" objects

I’m really new to tensorflow and just found something unusual when instantiating a keras metric object as follows.

import tensorflow as tf
m = tf.keras.metrics.Mean(name='test')

Once executing two lines above in python, GPU memory consumption soars from 0% to around 95% (about 10GiB) in a moment. And it never goes down until I terminate the program or delete the instance. I checked it on nvtop gpu monitor.

My machine is Ubuntu server with eight RTX2080Ti GPUs equipped. Plus, I’m using docker image provided by the Nvidia NGC (specifically, nvcr.io/nvidia/tensorflow:20.03-tf2-py3)

I observed the same issue on TitanXp machine. And another docker image (nvcr.io/nvidia/tensorflow:20.01-tf2-py3) showed the same issue.

Do you guys get the same issue? Is it a bug of tensorflow or the docker image?

I just found out that it was because I didn’t set gpu memory growth option. So nothing was the problem with tensorflow or docker images but my ignorance about basic usage of tensorflow.

For noobs who suffered from the same problem, enable gpu memory growth option by following python code.

import os
os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true"

Refer to tensorflow official guid: https://www.tensorflow.org/guide/gpu