Resnet101 nvinferserver intilization error

Please provide complete information as applicable to your setup.

• Hardware Platform: RTX 2080
• DeepStream Version: 5.0
• NVIDIA GPU Driver Version: 440.33.01
• Issue Type: bug
Hi,
I am trying to run a resnet101 based graphdef model using nvinferserver. There seems to be some issue while initialization. It throws the following error:-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
	 [[fc1/add_1/_3]]
  (1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

I face a similar error if I do not include the following two lines int the python script:

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

It throws an error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value import/stage2_unit2_bn1_scale
	 [[{{node import/stage2_unit2_bn1_scale/read}}]]

Thanks.

Hi,

Attempting to use uninitialized value stage2_unit1_sc_bias

This error indicates that some parameter doesn’t be set before using.
Have you initialized the parameters as following command first?

tf.global_variables_initializer()

Thanks.

Hi,
I have it already included the suggested command. My init function which works fine without throwing error -

def tf_init(tf_model_file):
        print("\n\ninit\n\n")
        device_name = "GPU:0"
        f = tf.gfile.GFile(tf_model_file, "rb")
        graph_def = tf.GraphDef()
        str1 = f.read()
        graph_def.ParseFromString(str1)
        with tf.Graph().as_default() as graph:
                tf.import_graph_def(graph_def)
                inp = tf.get_default_graph().get_tensor_by_name('import/data:0')
                out = tf.get_default_graph().get_tensor_by_name('import/fc1/add_1:0')
                tf.global_variables_initializer()
                tf.constant_initializer()
                tf.local_variables_initializer()
                #init = graph.get_operation_by_name("import/init")

                print(inp, out)
                cfg = dict({'allow_soft_placement': True,'log_device_placement': False})
                utility = 0.3
                cfg['gpu_options'] = tf.GPUOptions(per_process_gpu_memory_fraction = utility,allow_growth=True)
                cfg['allow_soft_placement'] = True
                cfg['device_count'] = {'GPU': 1}
                cfg['use_per_session_threads'] = True
                cfg['intra_op_parallelism_threads'] = 1
                cfg['inter_op_parallelism_threads'] = 1
                sess = tf.Session(config = tf.ConfigProto(**cfg))
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
        print("\n\ninit-exit\n\n")
        return sess,inp,out

If I do not include the following two lines it throws the above shown error.

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

My objective is to run the model in deepstream pipeline(using nvinferserver). I am getting similar errors as shown above which makes me believe there is some issue while initialization.

Hi,

You will need to initial the TensorFlow parameter before inference.
Do you still meet error after adding the initialization?

More, it seems that you are trying to use TensorFlow with deepstream pipeline.
Would you mind to tell us more about your procedure?

To inference with Deepstream pipeline, the model should be converted into TensorRT rather than TensorFlow.

Thanks.

Hi @AastaLLL,

I do not face any error after adding the initialization. Yes, I am trying to use tensorflow(graphdef) with deepstream which is supported in the latest version of deepstream(5.0) using nvinferserver plugin(does API calls to trition server). TensorRT convertible model was a requirement until the previous version(using nvinfer plugin) but in deepstream5.0 TensorFlow models are also supported(using nvinferserver plugin).

I have created a pipeline
src -> h264parse -> nvv4l2decoder -> nvinferserver -> queue -> fakesink

I have used the pipeline to infer TensorFlow models before without errors but this model throws the above-given error.

Thanks.

Hi,

Thanks for the clarification.
We will check this internally and update more information with you later.

Thanks.

HI,
Sorry for a late reply,
Is it possible to provide model you used for a local repro?

Hi,
I can not share the model I am using as it is proprietary. I have converted of the shelf resnet101 to graphdef format which gives similar issue. Here is the link from where you can get the model and script to run the the model:

Hi
I tried to run your model, but have nothing experience on tensorflow, tried with tensorflow-gpu 1.14 and 1.15 always met below error, can you specify how could we run it firstly?

init

WARNING:tensorflow:From infer.py:16: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From infer.py:17: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

Traceback (most recent call last):
File “infer.py”, line 69, in
sess,tf_inp,tf_out = tf_init(model_path)
File “infer.py”, line 19, in tf_init
graph_def.ParseFromString(str1)
google.protobuf.message.DecodeError: Error parsing message

Hi,
I think while importing use -

import tensorflow.compat.v1 as tf

I have the code working with both tf version 1.14 gpu and 2.3.1.
2ndly make sure the model is downloaded properly(go to the link and download the .pb file separately). It should be 170 Mb file. It is a git lfs file so downloading the folder may lead to improper download(faced the issue myself. Only link to the file was downloaded).

If this does not work then there may be an issue with protobuf version. I have tried this on with the following env
tensorflow==2.3.1, protobuf==3.6.0
and
tensorflow-gpu==1.14.0, protobuf==3.11.1

this should work as I have tried, fresh install of tensorflow and multiple systems and made the code run.

If any further help required, u can further reply

thanks.

the pb file is just 134 byte,
-rw-rw-r-- 1 amyc amyc 134 11月 5 18:33 tf_frozen_resnet101.pb
downloaded the zip package, and i also tried download by
git clone https://github.com/sherlockdutta/models.git
but only README.md
amyc@amycserver:~/work/models$ ls
README.md
Anything wrong?

Finally downloaded by open the link of of file.

Hi,
sorry, did not upload proper README to get the setup working and downloading the model file. Will update the README file. The model is on master branch not on main branch.
Reply if any further help required.

Thanks.

Hi,
I am trying to run a resnet101 based graphdef model using nvinferserver. There seems to be some issue while initialization. It throws the following error:-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
	 [[fc1/add_1/_3]]
  (1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

I face a similar error if I do not include the following two lines int the python script:

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

It throws an error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value import/stage2_unit2_bn1_scale
	 [[{{node import/stage2_unit2_bn1_scale/read}}]]

Thanks.

Hi,
We can run your model successfully on Tesla P4 card or T4 card, please let me know your issue in the description fix or not.

root@27c40e1cfcbe:~/workspace/models-master# CUDA_VISIBLE_DEVICES=1 python infer.py

init

Tensor(“import/data:0”, shape=(?, 224, 224, 3), dtype=float32) Tensor(“import/softmax:0”, shape=(?, 1000), dtype=float32)
2020-11-25 10:34:46.178999: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-25 10:34:46.206023: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394595000 Hz
2020-11-25 10:34:46.206253: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x992a5e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:34:46.206287: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-25 10:34:46.209399: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-25 10:34:46.332165: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x8f8e670 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:34:46.332226: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P4, Compute Capability 6.1
2020-11-25 10:34:46.333680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:04:00.0
2020-11-25 10:34:46.334261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 10:34:46.336358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 10:34:46.338268: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-25 10:34:46.338862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-25 10:34:46.341406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-25 10:34:46.343472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-25 10:34:46.348319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 10:34:46.349768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-11-25 10:34:46.349836: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 10:34:46.351017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:34:46.351039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-11-25 10:34:46.351061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-11-25 10:34:46.352481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5328 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:04:00.0, compute capability: 6.1)

init-exit

1.15.2
(1, 224, 224, 3)
2020-11-25 10:34:49.038501: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 10:34:49.173622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 10:34:50.447730: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.456144: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.499009: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.507382: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
tf_processing : exit!!

Hi @amycao,
I am also able to run the infer script for the model but the same is not inferencing using nvinferserver plugin. It gives the following error-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
(0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
[[{{node stage2_unit1_sc_bias/read}}]]
[[fc1/add_1/_3]]
(1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
[[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

If u remove the lines from infer.py it gives similar errors-

init = graph.get_operation_by_name(“import/init”)
sess.run(init)

I want it run the model in deepstream(using nvinferserver). My guess is the graph is not initialized while loading in deepstream. Thats why I shared the model file along with script so that you can run and test it.
Main objective is to get the graph run on deepstream.

Were you able to test it in deepstream pipeline?

Thanks.