Why is GPU still used while cuda out-of-memory error occurs?

I am using Tensorflow to perform inference on a dataset on Ubuntu. While it reports a cuda out-of-memory error, the nvidia-smi tool still shows that GPU is used, as shown below:

My code is predicting one example at a time, so no batch used. I am using GPU 0 so the the first 47% is the one my code is using. The error message is below:

INFO:tensorflow:Restoring parameters from /plu/../../model-files/model.ckpt-2683000
2021-09-09 07:49:24.230623: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 15.75G (16914055168 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-09-09 07:49:31.674556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

My machine has a lot of memory, as shown below:

free -hm
              total        used        free      shared  buff/cache   available
Mem:           125G         16G        8.3G        1.1G        100G        107G
Swap:            0B          0B          0B

I have 2 questions:

Why is gpu still being used normally while a cuda out of memory error occurs? It seems my machine has a lot of memory. Does it mean those 107G memory is not used but all only cuda memory (16G) is used and that caused the out of memory error?