Sorry for the late response.
Jetson nano image release 4.3
Nvidia driver version: Preinstalled driver
Tegra release: REVISION: 3.1, GCID: 18186506, BOARD: t210ref, EABI: aarch64, DATE: Tue Dec 10 06:58:34 UTC 2019
CUDA version: 10.0
CUDNN version: 7.6.3
Python version 3.6.9
Tensorflow version: 1.14
TensorRT version: 6.0.1
I have solved the “problem” that I had by changing the output of the cut model to the layer added on the uff-tensorRT parsing process. In short, the output of the original model at layer 20 is not the same as the output given by a model which only includes those 20 first layers as the UffParser is adding a new layer. Seems that in the sample case I provide the output operation is being performed twice, thus changing the weights of the pre-trained network.
After some tests I realized that the added layer was actually providing the original output of the layer. I provide you two .pb models with its text description and also the corresponding .uff model for each.
As I see it, there may be no problem as the accuracy of the inference on the complete model is only 1 or 2% bellow the original which I consider fair enough considering the performance gain.
I cannot upload the models to the post so you can find them here https://drive.switch.ch/index.php/s/3BY3X0dEF3m2JKb
As UFF-TensorRT parser I used your sample provided under /usr/src/tensorRT/samples/python/ with couple of modifications to create the functionality mentioned above changing the output of the “half” model to the tensort after the “usual” output:
with trt.Builder(logger) as builder, builder.create_network() as network, trt.UffParser() as parser:
builder.max_workspace_size = self.GiB(1)
parser.register_input("input_1", (3, 224, 224))
parser.parse("./my_model.uff", network)
network.unmark_output(network.get_output(0))
network.mark_output(network.get_layer(network.num_layers - 2).get_output(0))
engine = builder.build_cuda_engine(network)
inputs, outputs, bindings, stream = allocate_buffers(engine)
with engine.create_execution_context() as context:
case_num = load_normalized_test_case(pagelocked_buffer=inputs[0].host)
# For more information on performing inference, refer to the introductory samples.
# The common.do_inference function will return a list of outputs - we only have one in this case.
[output] = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
pred = np.argmax(output)
print("Prediction: " + str(pred))
As my problem is “solved” I send this information just in case there is some kind of bug. I understand for your answer that the creation of these layers just before the output is not something expected so I would like to understand why this happens.
If I can provide something else, do not hesitate to ask.