Hello all! As the topic title suggests, I have noticed some prediction discrepancies between what should otherwise be identical models. As I can generate a TensorRT 5.1.5 engine based on that model without error, I am left without an obvious place or strategy to hunt down the issue. My workflow (and environment description) is below (all using the relevant Python APIs):
- Generate
.pb
from a TF-1.13.1 + Keras model, after disabling training nodes & freezing. - Generate
.onnx
(both NCHW & NHWC) from.pb
- Ensure that predictions are identical between “TF + .pb” and “ONNX Runtime + .onnx”
- Generate
.trt
(both NCHW & NHWC) from.onnx
- Ensure that predictions are identical between “ONNX Runtime + .onnx” and “TensorRT5.15 + .onnx”
I make it all the way to (5) without issue, warning, or error; but the predictions from “ONNX Runtime + .onnx” are correct and the predictions from “TensorRT5.15 + .onnx” are incorrect.
I figure this will be hard, if not impossible, for anyone here to debug remotely, which I completely understand. So, in lieu of that , I was wondering if anyone could suggest any tools or techniques I can use to hunt down the discrepancy? Incidentally, this is the second model I have shepherded through this process; the first worked perfectly using nearly identical code… so I’m proper stuck :).
Apologies for such a general question, and thanks for your time and patience!
> dpkg -l | grep TensorRT
ii graphsurgeon-tf 5.1.5-1+cuda10.1 amd64 GraphSurgeon for TensorRT package
ii libnvinfer-dev 5.1.5-1+cuda10.1 amd64 TensorRT development libraries and headers
ii libnvinfer-samples 5.1.5-1+cuda10.1 all TensorRT samples and documentation
ii libnvinfer5 5.1.5-1+cuda10.1 amd64 TensorRT runtime libraries
ii python3-libnvinfer 5.1.5-1+cuda10.1 amd64 Python 3 bindings for TensorRT
ii python3-libnvinfer-dev 5.1.5-1+cuda10.1 amd64 Python 3 development package for TensorRT
ii tensorrt 5.1.5.0-1+cuda10.1 amd64 Meta package of TensorRT
ii uff-converter-tf 5.1.5-1+cuda10.1 amd64 UFF converter for TensorRT package
> pip show tensorflow onyx
Name: tensorflow
Version: 1.13.1
<snip>
Name: onnx
Version: 1.6.0
<snip>