We have been testing tensor flow and tensorRT on a DrivePX2 and we are unable to get similar results. We have compile Tensor Flow 1.8 on DPX2 (aarch64) and all our tests are on the device. We have a network trained to produce 20 points (predicted trajectories). Our Tensor Flow workflow is as follows.
- 1920x1208 sekonix ar231 H264 images are captured and stored and converted to PNG using ffmpeg.
- for training each image is first resized to 480x302 and then cropped to 336x140.
- The network is trained with dimensions 336x140 (resized and cropped image) and format of HxWxC (336x140x3) (optimized for tensor flow).
For tensorRT on DPX2 workflow:
- The above model is converted to UFF and CxHxW (3x140x336 needed for tensorRT) and converted to bin format using tensorRT optimizer tool on DPX2 with NO optimization options turned on.
- A 1920x1208 ar231 H264 Image is acquired at run time or loaded form a serialized file.
- Using our own Cuda Kernel we resize the image to 480x302.
- using our own Cuda Kerenel we crop the image to 336x148.
5 We initialize tendorRT data conditioner (dwDataConditioner_initialize) with network dimensions 336x148 and the option ignoreAspectRatio to true. We have tried to turn this option off and although we get different results , still the results do with not match with TensorFlow. - We run a single inference using dwDataConditioner_prepareData and the ROI of x=0, y-0, width-network width(336) , height=networkheight(140) and our resized/cropped image as input to the network. Our image is single plane interleaved RGBA, the documentation states that in the case of an interleaved image the A channel is dropped,
Given this work flow we are getting very different results on DPX2 with tensorRT than tensor Flow.
NOTE:In our latest test, using a new Cuda Kernel, we converted the final image from interleaved single plane RGBA to 3 plane RGB image and results (the 20 output points) do not match. Any idea what we may be doing wrong?