2GB Protobuf limit

carlkulseng · February 17, 2020, 12:11pm

Greetings,

I am training some deep learning models using Ubuntu Server 18.04 along with the following nvidia docker: nvcr.io/nvidia/tensorflow:20.01-tf1-py3

Tensorflow-gpu 1.15.2
CUDA 10.2
GPU TITAN RTX 24GB

Everything is working very well and with a solid performance boost compared to my windows setup.
However, it is limited to having a set window size of 328,328,328 - once I go with any larger window size the following happens:

File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3166, in _as_graph_def
graph.ParseFromString(compat.as_bytes(data))

As far as I understand this has to do with the protbuf limit set at 2GB as default. If I print the length of the data being serialized it is just under 2GB at the working window_size and just over when it is not working.

If I run the same script in windows10 (with pip install of tensorflow-gpu==1.15.2), the protobuf error is not present. But it is running out of memory with higher resolutions as windows is using significant amounts of VRAM in background.

My question is: How do I raise the limit of the protobuf while using the dockerfile in question. I am not interested in simply lowerering the window size because the higher windowsize we choose, the better is the quality of the output in the end.

I have tried looking into the ops.py and also the coded_stream.h - without knowing exactly what to change