Resnet101 nvinferserver intilization error

duttaneil16 · October 1, 2020, 7:38am

Please provide complete information as applicable to your setup.

• Hardware Platform: RTX 2080
• DeepStream Version: 5.0
• NVIDIA GPU Driver Version: 440.33.01
• Issue Type: bug
Hi,
I am trying to run a resnet101 based graphdef model using nvinferserver. There seems to be some issue while initialization. It throws the following error:-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
	 [[fc1/add_1/_3]]
  (1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

I face a similar error if I do not include the following two lines int the python script:

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

It throws an error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value import/stage2_unit2_bn1_scale
	 [[{{node import/stage2_unit2_bn1_scale/read}}]]

Thanks.

AastaLLL · October 5, 2020, 5:25am

Hi,

Attempting to use uninitialized value stage2_unit1_sc_bias

This error indicates that some parameter doesn’t be set before using.
Have you initialized the parameters as following command first?

tf.global_variables_initializer()

Thanks.

duttaneil16 · October 5, 2020, 7:29am

Hi,
I have it already included the suggested command. My init function which works fine without throwing error -

def tf_init(tf_model_file):
        print("\n\ninit\n\n")
        device_name = "GPU:0"
        f = tf.gfile.GFile(tf_model_file, "rb")
        graph_def = tf.GraphDef()
        str1 = f.read()
        graph_def.ParseFromString(str1)
        with tf.Graph().as_default() as graph:
                tf.import_graph_def(graph_def)
                inp = tf.get_default_graph().get_tensor_by_name('import/data:0')
                out = tf.get_default_graph().get_tensor_by_name('import/fc1/add_1:0')
                tf.global_variables_initializer()
                tf.constant_initializer()
                tf.local_variables_initializer()
                #init = graph.get_operation_by_name("import/init")

                print(inp, out)
                cfg = dict({'allow_soft_placement': True,'log_device_placement': False})
                utility = 0.3
                cfg['gpu_options'] = tf.GPUOptions(per_process_gpu_memory_fraction = utility,allow_growth=True)
                cfg['allow_soft_placement'] = True
                cfg['device_count'] = {'GPU': 1}
                cfg['use_per_session_threads'] = True
                cfg['intra_op_parallelism_threads'] = 1
                cfg['inter_op_parallelism_threads'] = 1
                sess = tf.Session(config = tf.ConfigProto(**cfg))
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
        print("\n\ninit-exit\n\n")
        return sess,inp,out

If I do not include the following two lines it throws the above shown error.

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

My objective is to run the model in deepstream pipeline(using nvinferserver). I am getting similar errors as shown above which makes me believe there is some issue while initialization.

AastaLLL · October 6, 2020, 5:51am

Hi,

You will need to initial the TensorFlow parameter before inference.
Do you still meet error after adding the initialization?

More, it seems that you are trying to use TensorFlow with deepstream pipeline.
Would you mind to tell us more about your procedure?

To inference with Deepstream pipeline, the model should be converted into TensorRT rather than TensorFlow.

Thanks.

duttaneil16 · October 6, 2020, 6:19am

Hi @AastaLLL,

I do not face any error after adding the initialization. Yes, I am trying to use tensorflow(graphdef) with deepstream which is supported in the latest version of deepstream(5.0) using nvinferserver plugin(does API calls to trition server). TensorRT convertible model was a requirement until the previous version(using nvinfer plugin) but in deepstream5.0 TensorFlow models are also supported(using nvinferserver plugin).

I have created a pipeline
src → h264parse → nvv4l2decoder → nvinferserver → queue → fakesink

I have used the pipeline to infer TensorFlow models before without errors but this model throws the above-given error.

Thanks.

AastaLLL · October 7, 2020, 3:36am

Hi,

Thanks for the clarification.
We will check this internally and update more information with you later.

Thanks.

Amycao · October 26, 2020, 6:26am

HI,
Sorry for a late reply,
Is it possible to provide model you used for a local repro?

duttaneil16 · November 5, 2020, 10:23am

Hi,
I can not share the model I am using as it is proprietary. I have converted of the shelf resnet101 to graphdef format which gives similar issue. Here is the link from where you can get the model and script to run the the model:

Amycao · November 13, 2020, 6:55am

Hi
I tried to run your model, but have nothing experience on tensorflow, tried with tensorflow-gpu 1.14 and 1.15 always met below error, can you specify how could we run it firstly?

init

WARNING:tensorflow:From infer.py:16: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From infer.py:17: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

Traceback (most recent call last):
File “infer.py”, line 69, in
sess,tf_inp,tf_out = tf_init(model_path)
File “infer.py”, line 19, in tf_init
graph_def.ParseFromString(str1)
google.protobuf.message.DecodeError: Error parsing message

duttaneil16 · November 17, 2020, 10:39am

Hi,
I think while importing use -

import tensorflow.compat.v1 as tf

I have the code working with both tf version 1.14 gpu and 2.3.1.
2ndly make sure the model is downloaded properly(go to the link and download the .pb file separately). It should be 170 Mb file. It is a git lfs file so downloading the folder may lead to improper download(faced the issue myself. Only link to the file was downloaded).

If this does not work then there may be an issue with protobuf version. I have tried this on with the following env
tensorflow==2.3.1, protobuf==3.6.0
and
tensorflow-gpu==1.14.0, protobuf==3.11.1

this should work as I have tried, fresh install of tensorflow and multiple systems and made the code run.

If any further help required, u can further reply

thanks.

Amycao · November 24, 2020, 8:17am

the pb file is just 134 byte,
-rw-rw-r-- 1 amyc amyc 134 11月 5 18:33 tf_frozen_resnet101.pb
downloaded the zip package, and i also tried download by
git clone GitHub - sherlockdutta/models
but only README.md
amyc@amycserver:~/work/models$ ls
README.md
Anything wrong?

Amycao · November 24, 2020, 8:22am

Finally downloaded by open the link of of file.

duttaneil16 · November 24, 2020, 12:09pm

Hi,
sorry, did not upload proper README to get the setup working and downloading the model file. Will update the README file. The model is on master branch not on main branch.
Reply if any further help required.

Thanks.

Amycao · November 25, 2020, 10:43am

Hi,
I am trying to run a resnet101 based graphdef model using nvinferserver. There seems to be some issue while initialization. It throws the following error:-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
	 [[fc1/add_1/_3]]
  (1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
	 [[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

I face a similar error if I do not include the following two lines int the python script:

def tf_init(tf_model_file):
        ...
        with tf.Graph().as_default() as graph:
                ...
                init = graph.get_operation_by_name("import/init")
                sess.run(init)
                ...

It throws an error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value import/stage2_unit2_bn1_scale
	 [[{{node import/stage2_unit2_bn1_scale/read}}]]

Thanks.

Hi,
We can run your model successfully on Tesla P4 card or T4 card, please let me know your issue in the description fix or not.

root@27c40e1cfcbe:~/workspace/models-master# CUDA_VISIBLE_DEVICES=1 python infer.py

init

Tensor(“import/data:0”, shape=(?, 224, 224, 3), dtype=float32) Tensor(“import/softmax:0”, shape=(?, 1000), dtype=float32)
2020-11-25 10:34:46.178999: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-25 10:34:46.206023: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394595000 Hz
2020-11-25 10:34:46.206253: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x992a5e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:34:46.206287: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-25 10:34:46.209399: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-25 10:34:46.332165: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x8f8e670 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:34:46.332226: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P4, Compute Capability 6.1
2020-11-25 10:34:46.333680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:04:00.0
2020-11-25 10:34:46.334261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 10:34:46.336358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 10:34:46.338268: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-25 10:34:46.338862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-25 10:34:46.341406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-25 10:34:46.343472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-25 10:34:46.348319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 10:34:46.349768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-11-25 10:34:46.349836: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 10:34:46.351017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:34:46.351039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-11-25 10:34:46.351061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-11-25 10:34:46.352481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5328 MB memory) → physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:04:00.0, compute capability: 6.1)

init-exit

1.15.2
(1, 224, 224, 3)
2020-11-25 10:34:49.038501: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 10:34:49.173622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 10:34:50.447730: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.456144: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.499009: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-11-25 10:34:50.507382: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
tf_processing : exit!!

duttaneil16 · November 25, 2020, 10:54am

Hi @Amycao,
I am also able to run the infer script for the model but the same is not inferencing using nvinferserver plugin. It gives the following error-

ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
(0) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
[[{{node stage2_unit1_sc_bias/read}}]]
[[fc1/add_1/_3]]
(1) Failed precondition: Attempting to use uninitialized value stage2_unit1_sc_bias
[[{{node stage2_unit1_sc_bias/read}}]]
0 successful operations.
0 derived errors ignored.

If u remove the lines from infer.py it gives similar errors-

init = graph.get_operation_by_name(“import/init”)
sess.run(init)

I want it run the model in deepstream(using nvinferserver). My guess is the graph is not initialized while loading in deepstream. Thats why I shared the model file along with script so that you can run and test it.
Main objective is to get the graph run on deepstream.

Were you able to test it in deepstream pipeline?

Thanks.

Amycao · December 1, 2020, 1:11pm

Yeah, it would be better if you could give one workable sample which can repro your issue for a quick try.

duttaneil16 · December 14, 2020, 7:12am

Hi amycao,
Sorry could not provide with sample code and config earlier.
here are the deepstream pipeline(can be replaced in place of deepstream_test1_app.c):
deepstream_test_resnet101.c (14.5 KB)
The config file for nvinferserver for res101:
config_tri_res.txt (809 Bytes)

mkdir named as resnet101_ and copy tf_frozen_resnet101.pb to resnet101_/1/model.graphdef

The config.pbtxt for the model is:
config.pbtxt (664 Bytes)

This should help in reproducing the issue.

Thanks.

Amycao · December 28, 2020, 2:27pm

Hi,
Thanks, but i can not run it,

root@e81e8893dfd3:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/resnet101_# ./deepstream-test-resnet101 …/…/…/…/samples/streams/sample_720p.h264

(deepstream-test-resnet101:410): GLib-GObject-CRITICAL **: 14:22:17.424: g_object_set: assertion ‘G_IS_OBJECT (object)’ failed

(deepstream-test-resnet101:410): GStreamer-CRITICAL **: 14:22:17.424: gst_bin_add_many: assertion ‘GST_IS_ELEMENT (element_1)’ failed
creating uridecodebin for […/…/…/…/samples/streams/sample_720p.h264]

(deepstream-test-resnet101:410): GStreamer-CRITICAL **: 14:22:17.425: gst_element_link_many: assertion ‘GST_IS_ELEMENT (element_2)’ failed
Elements could not be linked: 2. Exiting.

duttaneil16 · December 29, 2020, 7:01am

Hi @Amycao,

Seems like the setup created is wrong, for first resnet101_ should be the name of model repo.
I will list down the exact steps to get the setup working.

$ cd /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/
$ mkdir deepstream-resnet101_
$ cd deepstream-resnet101_/
$ mkdir resnet101_ 
$ mkdir resnet101_/1/
$ cp <path to config.pbtxt>/config.pbtxt ./resnet101_/
$ cp ../deepstream-test1/Makefile ./
$ cp <path to deepstream_test_resnet101.c>/deepstream_test_resnet101.c ./
$ cp <path to model file>/tf_frozen_resnet101.pb  ./resnet101_/1/model.graphdef
$ cp <path to infer config>/config_tri_res.txt ./
$ make
$ ./deepstream-test1-app /opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.jpg

The above code is for jpeg presently and to use h264 file as the input, change

type = 0 to type = 1

on line 285 (type = 0 is for jpg input and type if set to 1 is for h264 video)

then use
./deepstream-test1-app file:///opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.h264

Hopefully, this helps you reproducing the error.

Thanks.

Amycao · December 30, 2020, 3:17am

HI,
I copy the command you pasted directly, but failed again. can you run on your side?

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# ./deepstream-test1-app file:///opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.h264

(deepstream-test1-app:55): GLib-GObject-CRITICAL **: 03:06:14.847: g_object_set: assertion ‘G_IS_OBJECT (object)’ failed

(deepstream-test1-app:55): GStreamer-CRITICAL **: 03:06:14.848: gst_bin_add_many: assertion ‘GST_IS_ELEMENT (element_1)’ failed
creating uridecodebin for [file:///opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.h264]

(deepstream-test1-app:55): GStreamer-CRITICAL **: 03:06:14.849: gst_element_link_many: assertion ‘GST_IS_ELEMENT (element_2)’ failed
Elements could not be linked: 2. Exiting.

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# ./deepstream-test1-app /opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.jpg

(deepstream-test1-app:73): GLib-GObject-CRITICAL **: 03:11:49.272: g_object_set: assertion ‘G_IS_OBJECT (object)’ failed

(deepstream-test1-app:73): GStreamer-CRITICAL **: 03:11:49.274: gst_element_link_many: assertion ‘GST_IS_ELEMENT (element_2)’ failed
Elements could not be linked: 2. Exiting.

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# vim deepstream_test_resnet101.c

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# ls
Makefile deepstream-test1-app deepstream_test_resnet101.c deepstream_test_resnet101.o resnet101_

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# ls resnet101_/
1 config.pbtxt

root@57dac851f719:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-resnet101_# ls resnet101_/1/
model.graphdef