After successfully trained the network and save it in a SavedModel format, TF Serving displays the fallowing error:

2021-05-11 13:41:27.083035: I tensorflow/cc/saved_model/] SavedModel load for tags { serve }; Status: success: OK. Took 1886711 microseconds.
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Source info :
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Receptive field  : [160, 160]
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Placeholder name : lr_input
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Output spacing ratio: 0.25
2021-05-11 13:41:27 (INFO) TensorflowModelServe: The TensorFlow model is used in fully convolutional mode
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Output field of expression: [512, 512]
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Tiling disabled
2021-05-11 13:41:27 (WARNING): Streaming configuration through extended filename is used. Any previous streaming configuration (ram value, streaming mode ...) will be ignored.
2021-05-11 13:41:27 (INFO): File Image_test.tif will be written in 110 blocks of 512x512 pixels
Writing Image_test.tif?&gdal:co:COMPRESS=DEFLATE&streaming:type=tiled&streaming:sizemode=height&streaming:sizevalue=512...: 0% [                                                  ]2021-05-11 13:41:27.770868: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-05-11 13:41:28.572215: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-05-11 13:41:28.573738: E tensorflow/stream_executor/cuda/] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-05-11 13:41:28.573882: W tensorflow/core/framework/] OP_REQUIRES failed at : Not found: No algorithm worked!

Reading about it the solution to fix this is:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

but this solution conflicts with the code that I’m trying to run.

So the question is why is this error occurring in the first place?

TF 2.4.1, CUDA 11.3, CUDNN 8.2

If that solution fixes it, the problem is due to the fact that TF has a greedy allocation method (when you don’t set allow_growth). This greedy allocation method uses up nearly all GPU memory. When CUBLAS is asked to initialize (later), it requires some GPU memory to initialize. There is not enough memory left for CUBLAS to initialize, so the CUBLAS initialization fails.

this SO Question/Answer has additional relevant information. I won’t be able to sort this out for you here, and this particular forum is not really the right place to ask your question. I’m unlikely to respond to follow-up questions.

CUBLAS-specific questions should be asked on our accelerated libraries forum, but this question is really about Tensorflow behavior. NVIDIA doesn’t develop or support Tensorflow

It makes sense now, thanks for explaining the process.