I have constructed a machine-learning computer with two RTX 2070 SUPER GPUs connected with SLI Bridge (Windows).
I have benchmarked the system using http://ai-benchmark.com/alpha and got impressive results.
I then tried the same benchmark test on a GPU enabled tensor flow container:
using the “latest-gpu-py3-jupyter” tag.
I have then connected to this container as an interpreter to the same project in PyCharm(I mounted the project folder in the container).
When I run it, I get the error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[50,56,56,144] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[node MobilenetV2/expanded_conv_2/depthwise/BatchNorm/FusedBatchNorm (defined at usr/local/lib/python3.6/dist-packages/ai_benchmark/utils.py:238) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
This error relates to the exhaustion of GPU memory inside the container.
Why is the GPU on the windows host successfully handle the computation and the GPU on the Linux container exhaust the memory?
What makes this difference? is that related to memory allocation in the container?
Your help is highly appreciated,