Hi everyone! I’m currently working with machine learning inference on my Jetson Nano.
I trained a simple network with convolutional, maxpooling2d and dense layers and had exported it to a keras (.h5) file. The model input is an image
64x48 and has two output layers.
So, when I run inference with a black image (
np.zeros(64,48,3) ) on the following libraries/framework: Tensorflow, TFLite and TF-TRT, I get the same output data, e.g., approximately the values:
0,56 . The deal is when I run the very same input data on the same model, the predicted data isn’t the same of the previous engines, getting approximately the values
0,74. I converted the .h5 file to a trt_inference_graph (.pb), and then generated a .uff file from the .pb, running
python3 (...)python3.6/dist-packages/uff/bin/convert_to_uff.py model.h5
I double checked the stage pipeline between the .h5 to the final .uff, and tried different implementations.
.h5 -- trt.create_inference_graph --> .pb --> convert_to_uff.py --> .uff
.h5 -- trt.create_inference_graph --> .pb --> import to tensorrt and run uff.from_from_tensorflow_frozen_model(...)
Both implementations gave the same wrong.
I also have transposed the input matrix and registered the input and output layers on the parser. A weird phenomenon is that if i run TensorRT inference without the input layer, it will gets an error. But if i don’t register one or two output layers or if I register them with any names at all, the output will be exactly the (wrong) same and won’t give any error.
Following, I give my source code, hoping I can get some feedback about it. Have I missed some flag or am I making some mistake while sending the input data to the host?
2) Another doubt that I have is: if memory is shared between CPU and GPU, it is necessary to allocate the buffers on “device memory” (duplicating the data on the same memory). Also, it is necessary to copy the input data (
image) to another buffer (the host_input)?
Hope I can get some help :)