I prepared an ONNX model with a condition based on which either one branch of the graph is executed or another. The condition checks if a result of the reduce_max operation performed on the output of a subgraph is less than a provided constant. Then branch takes a lot of time to execute, and else is almost immediate. To make sure that the model performs lazy evaluation correctly, I ran some performance tests in both ONNX runtime, and TensorRT runtime (after conversion from ONNX using trtexec, and opset 13). I prepared two models (for each runtime) in which the condition is either always met (constant value is very low), or never met (constant value is too high). Although in ONNX the execution times are reasonable (a model with the condition never being met is much faster) which indicates that lazy evaluation is performed as expected, TensorRT clearly performs eager execution of both branches.
I have a question regarding the above - is this behavior expected? In the documentation I read that theoretically it should be lazily evaluated. The purpose of my model was to avoid computations that are not necessary and with eager execution this model doesn’t make any sense.
For the evaluation in ONNX, run the file test_conditional_onnx.py providing model_onnx_path = 'onnx_example_0.0.onnx' or model_onnx_path = 'onnx_example_1.0.onnx'. In the model onnx_example_0.0.onnx, the condition is never met which means much faster execution. In the model onnx_example_1.0.onnx the condition is always met, which means much slower execution.
However, after converting the above models to TRT using the following command: trtexec --onnx=/path_to_onnx --explicitBatch --saveEngine=/engine_saving_path --workspace=2048 --device=X --fp16, both of them have very similar execution times, which means that the execution of the conditional branches is eager.
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
I’ve just updated my post by uploading two models, and a script to run them in ONNX. The difference between them lies in different conditional branches being executed. One of them with lazy evaluation should be much faster. This is the case in ONNX, but not after converting it to TRT.
Inference times are almost the same. While running the script attached to my post, ONNX inference times for the model onnx_example_0.0.onnx are around 5-6 times lower than for the model onnx_example_1.0.onnx
I generated outputs of the above command trtexec with a flag --verbose for both models. However, I cannot upload them because of an error “new users cannot put links in posts”. The results can be easily reproduced though.
We don’t think Myelin does eager execution of both the branches. Depending on the model, Myelin can potentially perform speculative execution but hard to say what’s going on in this model. We recommend you to please provide us Nsys trace for a better understanding to check if both branches are getting executed or if the slowdown is due to some other reason. https://docs.nvidia.com/nsight-systems/UserGuide/index.html
Unfortunately the generated report weights over 0.5 GB (for each of the two models). I cannot attach it to my comment as it is too big. How can I share it with you?
Hi everyone. Will this be fixed? We don’t know if we should look for a different technology to achieve our goal. Please let us know, thanks in advance!