Output changes for the same input when the neural net has been run for several times?

abaranappu · August 20, 2018, 10:51am

Ubuntu 16.04
NVIDIA 1080 ti
driver 384
CUDA 9.0.176
CUDNN 7.1.4
Python 3.5
Tensorflow 1.8
TensorRT 4

We created a neural network, froze it, converted it to uff an ran on C++. (influenced by sampleUffMNIST.cpp in TensorRT4 samples). Output changes for the same input, whenever we run it several time.

The problem occurs even with a simple 4 layer residual network and one output. Basically, given the same uff file, the portion of the c++ code that parses it and builds the network does it differently every time.
After that the code runs fine; I’ve added a loop around execute to show that the outputs are the same after loading the uff file.
So I suspect something is wrong with uff file that is causing the parser to mis-parse it every time.

Please help me on the matter.
can you reproduce the error?
The outputs will not be the same at the 4th decimal place even when run 10 times in a row, and 4th decimal place errors cannot be due to rounding errors.
Anyone familiar with the error? any solutions?

The problem increases as we add more layers.

Is it a problem regarding TensorRT version? memory allocation? or anything else.

Has any of you faced a similar issue? Your answers are welcome.

abaranappu · August 23, 2018, 2:54pm

python code snippet

N = 9
NUM_FILTERS = 32
NUM_INPUT_CHANNELS = 10

board_input2 = tf.placeholder(tf.float32, shape=(NUM_INPUT_CHANNELS, N, N), name=‘input_node’)
board_input3 = tf.reshape(board_input2, [N, N, NUM_INPUT_CHANNELS])

h11 = convLayer(board_input3, fill_conv4_matrix(NUM_INPUT_CHANNELS, NUM_FILTERS, 3, L1W), layer_num=0)

value_output = tf.sigmoid(tf.reshape(h11, [NN32]), name=‘v_output’)

// C++ code snippet

parser->registerInput(“input_node”, Dims3(10, 9, 9), UffInputOrder::kNCHW);
parser->registerOutput(“v_output”);

NVES · August 27, 2018, 5:21pm

Hello,

This is not expected. abaranappu, can you provide the uff file and the uff.txt file as well used for this experiment? What was the input data given to the C++ program? Did that stay constant?

thanks

abaranappu · August 28, 2018, 3:20pm

Thank you very much for replying.

uff, text files and C++ files are in here. [url]https://github.com/hdpoorna/nv_forum[/url]
And also attached as a zip file.

simple.uff has one output, full.uff has two outputs.
both have inconsistent outputs with the same input, and in the case of full.uff, it is always the second output that has errors, and rarely the first.Error happens even if we switch the order of the 2 outputs.
files.zip (215 KB)

NVES · August 28, 2018, 4:19pm

Thank you for the information. Will keep you updated on what we find.

NVES · August 29, 2018, 9:47pm

Hello, we are unable to repro this issue. Tried running simple.uff around 10 times, and saw the same output all 10 times For full.uff we only see some difference in the 7th or 8th decimal places.

Can you re-run with info level logging (kINFO) and compare the logs between the two runs with varying outputs.
I don’t think this is a parser issue because the parser will generate the same network given the same uff file.

Could be possible that different consecutive runs select different kernels which may be causing some minor difference in outputs.

Can you tell me to what degree your outputs vary?

abaranappu · August 30, 2018, 3:27am

The second output, v_output (batch size 2, dim 1) is constantly changing, but not the p_output (batch_size 2, dim 82)

is uff version 0.3.0 ok?

results are in the zip file attached.
full_cpp.txt - results from C++ file
full_py.txt - result from running the same thing on python
Uff_mcts.cpp - C++ file

I will also post results when the outputs are swapped when we create the uff file.

also, could you help me run kINFO log?
results.zip (6.66 KB)

abaranappu · August 30, 2018, 4:03am

results when the outputs are swapped when we create the uff file, are attached here.

Please note that the 2nd output is printed 1st.

changed the line as follows,
output_node_names = [‘p_output’, ‘v_output’] -----> output_node_names = [‘v_output’, ‘p_output’]

before plugging it into,
uff_model = uff.from_tensorflow(final_output_graph_def,
output_node_names,
intput_node=‘input_node’,
output_filename = “models/full.uff”,
text=True)
outputs_swapped.zip (3.53 KB)

nllim30 · August 30, 2018, 4:30am

Hi, I’m abaranappu’s supervisor, and thanks for taking the time to help us resolve this.

The second output, which is changing rapidly, refers to the output assigned to buffer 2, whereas the first output refers to that in buffer 1.

To further complicate the issue, the first output is also not consistent, but one will need to run around 30-40 times, with the same input, before seeing problems.

The neural network is a simple residual CNN (4 blocks) with two heads, one with an MLP layer leading to ‘p-output’, the other with another MLP layer leading to ‘v_output’.

Please help.

NVES · August 30, 2018, 5:43pm

Hello,

We tried running the experiment with full.uff around 20 times and saw the differences in v_output. The info level log doesn’t show any difference in the layers being run so not a parser issue. The difference needs to be investigated further.

Regarding your questions:
is uff version 0.3.0 ok?
UFF 0.4.0 is the shipped version for TRT 4.0 GA, so that is recommended (although I don’t see this as a UFF issue)
can you share documentation on how to enable info level logging (kINFO)?
To turn on INFO level logging, change the severity of the logger object from kWARNING to kINFO

abaranappu · August 31, 2018, 4:16am

Thank you so very much for helping.
Please help us figure out the issue.

NVES · August 31, 2018, 10:30pm

Hello,

Quick update. We are making progress debugging this issue. One thing that will help us a lot is a figure or description of what your network looks like: specifically, what each layer is doing, the kernel sizes, connections, etc.

thanks.

NVIDIA Enterprise Support

abaranappu · September 1, 2018, 2:25am

Here is the network part of the code. The whole code and weights are attached in a zip file.

board_input2 = tf.placeholder(tf.float32, shape=(None, NUM_INPUT_CHANNELS, N, N), name='input_node')

layers = []
layers.append(convLayer(board_input2, fill_conv4_matrix(NUM_INPUT_CHANNELS, NUM_FILTERS, 3, L1W), layer_num=0))

for i in range(1, NUM_RES_BLOCKS + 1):    
    layers.append(res_block(layers[i-1], fill_4R_matrix(NUM_FILTERS, NUM_FILTERS, 3, WA[i-1]), fill_4R_matrix(NUM_FILTERS, NUM_FILTERS, 3, WB[i-1]), layer_num=i))

# value
value_head = convLayer(layers[NUM_RES_BLOCKS], fill_4R_matrix(NUM_FILTERS, 4, 3, vh_W), layer_num=1)
    
value_inv_output_logits = logits(tf.reshape(value_head, [-1, N*N*4]), fill_inv_matrix(64, vh_HW), vh_Hb, 'vh')

hidden_layer_output = tf.nn.relu(value_inv_output_logits)

value_output = tf.sigmoid(tf.matmul(hidden_layer_output, vh_sigL_W) + vh_sigL_b, name='v_output')             
    
# policy
policy_head = convLayer(layers[NUM_RES_BLOCKS], fill_4R_matrix(NUM_FILTERS, 4, 3, ph_W), layer_num=2)
       
pass_output = logits(tf.reshape(policy_head, [-1, N*N*4]), fill_inv_matrix(1, ph_pass_W), ph_pass_b, 'ph')

equiv_output_h1 = tf.reduce_mean(tf.nn.conv2d(policy_head, fill_4R_matrix(4, 4, 2*N - 1, ph_rest_W), strides=[1, 1, 1, 1], padding='SAME', data_format='NCHW'), 1)
equiv_output = tf.reshape(equiv_output_h1 + ph_rest_b, [-1, N*N])

policy_output_logits = tf.concat([equiv_output, pass_output], 1)

policy_output = tf.nn.softmax(policy_output_logits, name='p_output')

net.zip (196 KB)

NVES · September 6, 2018, 8:54pm

Hello abaranappu,

There was a bug in the initialization of the input buffer in the user’s code. Some of the values were uninitialized, thus having garbage values.

We added these lines in createMnistCudaBuffer to initialize the buffer correctly:

for (int i = 0; i < eltCount; ++i)
    inputs[i] = 0.0f;

Another bug found in user’s code.
This line:
buffers.reserve(nbBindings);
Should be:
buffers.resize(nbBindings);

The input values are now fixed and unchanging. The layers of the engine are unchanged too now. The 2 V outputs are approximately equal, but their values change in some executions:

Iteration 1:
0 => 0.999999 : ***
1 => 0.999999 :
Iteration 2:
0 => 0.397986 : ***
1 => 0.39674 :
Iteration 3:
0 => 0.395557 : ***
1 => 0.394716 :
Iteration 4:
0 => 0.395557 : ***
1 => 0.394716 :
Iteration 5:
0 => 1 : ***
1 => 1 :
Iteration 6:
0 => 0.397986 : ***
1 => 0.39674 :
Iteration 7:
0 => 0.518526 : ***
1 => 0.513405 :

You mentioned your implimentation is derived from sampleUffMNIST, and we are able to repeatedly run ./sample_uff_mnist with consistent outputs.

I recommend reviewing the user code.

nllim30 · September 7, 2018, 5:16am

Hi NVES,
Thanks for taking the time to check this. We have fixed the initialization, but the bug persists, and we can prove that Tensor RT is optimizing at the expense of accuracy.

We’ve changed the batch size to 1 and the input to all zeros to simplify things. Also we’ve added a loop around execute

for (int i = 0; i < 50; i++)
{
execute(*engine);
}

to show that that the output does not change within runs, but rather between runs, so something different is happening each time the model is parsed and the engine is created.

Below is an example of different outputs from two different runs; note that each output is repeated 50 times in each run. We’ve attached the logs for both runs.

Output from first run:

— OUTPUT —
0 => 0 :
1 => 0 :
2 => 0 :
3 => 0 :
4 => 0 :
5 => 0 :
6 => 0 :
7 => 0 :
8 => 0 :
9 => 0 :
10 => 0 :
11 => 0 :
12 => 0 :
13 => 0 :
14 => 0 :
15 => 0 :
16 => 0 :
17 => 0 :
18 => 0 :
19 => 0 :
20 => 0 :
21 => 0 :
22 => 0 :
23 => 0 :
24 => 0 :
25 => 0 :
26 => 0 :
27 => 0 :
28 => 0 :
29 => 0 :
30 => 0 :
31 => 0 :
32 => 0 :
33 => 0 :
34 => 0 :
35 => 0 :
36 => 0 :
37 => 0 :
38 => 0 :
39 => 0 :
40 => 0 :
41 => 0 :
42 => 0 :
43 => 0 :
44 => 0 :
45 => 0 :
46 => 0 :
47 => 0 :
48 => 0 :
49 => 0 :
50 => 0 :
51 => 0 :
52 => 0 :
53 => 0 :
54 => 1 : ***
55 => 0 :
56 => 0 :
57 => 0 :
58 => 0 :
59 => 0 :
60 => 0 :
61 => 0 :
62 => 0 :
63 => 0 :
64 => 0 :
65 => 0 :
66 => 0 :
67 => 0 :
68 => 0 :
69 => 0 :
70 => 0 :
71 => 0 :
72 => 0 :
73 => 0 :
74 => 0 :
75 => 0 :
76 => 0 :
77 => 0 :
78 => 0 :
79 => 0 :
80 => 0 :
81 => 0 :

1 eltCount
— OUTPUT —
0 => 1 : ***

Output from second run:

— OUTPUT —
0 => 0.0122097 : ***
1 => 0.0122097 :
2 => 0.0122097 :
3 => 0.0122097 :
4 => 0.0122097 :
5 => 0.0122097 :
6 => 0.0122097 :
7 => 0.0122097 :
8 => 0.0122097 :
9 => 0.0122097 :
10 => 0.0122097 :
11 => 0.0122097 :
12 => 0.0122097 :
13 => 0.0122097 :
14 => 0.0122097 :
15 => 0.0122097 :
16 => 0.0122097 :
17 => 0.0122097 :
18 => 0.0122097 :
19 => 0.0122097 :
20 => 0.0122097 :
21 => 0.0122097 :
22 => 0.0122097 :
23 => 0.0122097 :
24 => 0.0122097 :
25 => 0.0122097 :
26 => 0.0122097 :
27 => 0.0122097 :
28 => 0.0122097 :
29 => 0.0122097 :
30 => 0.0122097 :
31 => 0.0122097 :
32 => 0.0122097 :
33 => 0.0122097 :
34 => 0.0122097 :
35 => 0.0122097 :
36 => 0.0122097 :
37 => 0.0122097 :
38 => 0.0122097 :
39 => 0.0122097 :
40 => 0.0122097 :
41 => 0.0122097 :
42 => 0.0122097 :
43 => 0.0122097 :
44 => 0.0122097 :
45 => 0.0122097 :
46 => 0.0122097 :
47 => 0.0122097 :
48 => 0.0122097 :
49 => 0.0122097 :
50 => 0.0122097 :
51 => 0.0122097 :
52 => 0.0122097 :
53 => 0.0122097 :
54 => 0.0122097 :
55 => 0.0122097 :
56 => 0.0122097 :
57 => 0.0122097 :
58 => 0.0122097 :
59 => 0.0122097 :
60 => 0.0122097 :
61 => 0.0122097 :
62 => 0.0122097 :
63 => 0.0122097 :
64 => 0.0122097 :
65 => 0.0122097 :
66 => 0.0122097 :
67 => 0.0122097 :
68 => 0.0122097 :
69 => 0.0122097 :
70 => 0.0122097 :
71 => 0.0122097 :
72 => 0.0122097 :
73 => 0.0122097 :
74 => 0.0122097 :
75 => 0.0122097 :
76 => 0.0122097 :
77 => 0.0122097 :
78 => 0.0122097 :
79 => 0.0122097 :
80 => 0.0122097 :
81 => 0.0110169 :

1 eltCount
— OUTPUT —
0 => 0.384092 : ***

files.zip (135 KB)

NVES · September 11, 2018, 8:07pm

Hello,

TRT was not handling horizontal merge of layers without bias weights correctly. We will fix this in a future version.

in the meantime, for TRT 4.0, user cannot use a convolution layer with no bias. Instead, a bias values of 0.0f should be used for such layers. This has the same effect as no bias, but will get around the bug in TRT 4.0

We are sorry any inconvinience this is causing. We cannot share more information about further release here.
Please pay attention to our announcement for the information.

nllim30 · September 12, 2018, 3:31am

Hi NVES,
Thanks for this information. We’ve added zero biases for all convolutional layers and we’ve vastly simplified the model so that it is almost entirely convolutional. However, the bug still resurfaces. Can you tell us which other layers are not supported? The model is specified below, and we’ve attached output logs.

def convLayer(input, W_val, layer_num, n_out):
W = tf.get_variable(‘cW’ + str(layer_num), initializer=W_val)
h1 = tf.nn.conv2d(input, W, strides=[1, 1, 1, 1], padding=‘SAME’)
b = tf.get_variable(“cb” + str(layer_num), [n_out], initializer=tf.zeros_initializer())
tf.nn.bias_add(h1, b)

return tf.nn.relu(h1)

def logits(input, W_val, b_val, name):
W = tf.get_variable(name + ‘_W’, initializer=W_val)
b = tf.get_variable(name + ‘_b’, initializer=b_val)

return tf.nn.bias_add(tf.matmul(input, W_val), b)

def res_block(input, WA_val, WB_val, layer_num, n_out):
WA = tf.get_variable(‘rWA’ + str(layer_num), initializer=WA_val)
h1 = tf.nn.conv2d(input, WA, strides=[1, 1, 1, 1], padding=‘SAME’)
b1 = tf.get_variable(“rbA” + str(layer_num), [n_out], initializer=tf.zeros_initializer())
tf.nn.bias_add(h1, b1)

inter = tf.nn.relu(h1)

WB = tf.get_variable('rWB' + str(layer_num), initializer=WB_val)
h1b = tf.nn.conv2d(inter, WB, strides=[1, 1, 1, 1], padding='SAME')    
b2 = tf.get_variable("rbB" + str(layer_num), [n_out], initializer=tf.zeros_initializer())
tf.nn.bias_add(h1b, b2)

return tf.add(tf.nn.relu(h1b), input)

board_input2 = tf.placeholder(tf.float32, shape=(None, N, N, NUM_INPUT_CHANNELS), name=‘input_node’)

layers =
layers.append(convLayer(board_input2, L1W, layer_num=0, n_out=NUM_FILTERS))

for i in range(1, NUM_RES_BLOCKS + 1):
layers.append(res_block(layers[i-1], WA[i-1], WB[i-1], layer_num=i, n_out=NUM_FILTERS))

value_head = convLayer(layers[NUM_RES_BLOCKS], vh_W, layer_num=1, n_out=4)
value_output = tf.sigmoid(value_head, name=‘v_output’)

policy_head = convLayer(layers[NUM_RES_BLOCKS], ph_W, layer_num=2, n_out=4)
pass_output = logits(tf.reshape(policy_head, [-1, NN4]), ph_pass_W, ph_pass_b, ‘ph’)
policy_output = tf.nn.softmax(pass_output, name=‘p_output’)
files2.zip (98.8 KB)

NVES · September 13, 2018, 6:30pm

hello,

unfortunately, we don’t think there’s a workaround with TRT4. Bias values of zero have no effect on convolution layer output. TRT removes those bias values. Adding zero bias to network was just a temporary solution and it looks like this solution doesn’t work.

We do have a fix in a future release of TRT.

Sorry again that we cannot share more information about future release here.
Please pay attention to our announcement for the information.

abaranappu · October 29, 2018, 6:19am

Hi,

Ubuntu 16.04
NVIDIA GeForce RTX 2080 Ti
NVIDIA driver 410.66
CUDA 10.0.130
CUDNN 7.3.1
Python 3.5
Tensorflow 1.12.0-rc1
TensorRT 5.0.0.10 RC

The bug is still there.
outputs are attached.
we also switched the outputs, so that it’s clear.
outputs_trt5.rtf (28.4 KB)

NVES · October 30, 2018, 3:15pm

Hello,

My apologies, the fix did not go into TRT 5.0.0.10. The fix is committed for TRT 5 GA. Please stay tuned for the release announcement.

Topic		Replies	Views
For the same input, output changes when the neural net has been run for several times in a row TensorRT	2	1960	August 28, 2018
Incorrect Results during Inference using Tensorrt3.0 C++ uff parser Jetson TX2	48	6971	June 4, 2018
[TensorRT] ERROR: Parameter check failed at: Utils.cpp::reshapeWeights::71, condition: input.values != nullptr TensorRT	13	5693	October 12, 2021
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4600	October 18, 2021
UFF parser errors Jetson TX2	31	5727	October 18, 2021
TensorRT and Tensorflow: convert to uff failed Jetson TX2	43	14773	October 18, 2021
Problem with custom layers and Python UFF parser in TensorRT 3.0 RC Jetson TX2	41	7981	October 18, 2021
Very odd results when inferencing Digits TF model on Jetson TX2 with JetPack 3.3 Jetson TX2	13	1589	October 18, 2021
TensorRT3.0 results are different from Tensorflow (with a minimal example code) TensorRT	1	1065	August 13, 2018
Concat error with uff parser for TensorRT4 TensorRT	10	1726	October 12, 2021

Output changes for the same input when the neural net has been run for several times?

python code snippet

Related topics