TensorRT engine cannot be built due to workspace size even if it's set higher

Description

Hi,
I’m recently having trouble with building a TRT engine for a detector yolo3 model. The original model was trained in Tensorflow (2.3), converted to onnx (tf2onnx most recent version, 1.8.3) and then I convert the onnx model to TensorRT.

The exact error is the following:

[TensorRT] ERROR: …/builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node StatefulPartitionedCall/functional_3/tiny_yolov3/tf_op_layer_ArgMax/ArgMax.)
[TensorRT] VERBOSE: Builder timing cache: created 74 entries, 26 hit(s)
[TensorRT] ERROR: …/builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node StatefulPartitionedCall/functional_3/tiny_yolov3/tf_op_layer_ArgMax/ArgMax.)

The conversion code from onnx to TRT is basically what’s done here:

I set a workspace of 1<<30 , which should be more than enough. Tried setting it higher (1<<34, 2<<23 etc.) but it didn’t help…
I attach here the verbose output of the conversion: error_trtexec_verbose.txt (498.1 KB) .
I can’t share the relevant onnx model.
Why do I keep getting an error related to the workspace size, even if I set it higher? How can this be solved?

Environment

TensorRT Version : 7.1.2
CUDA Version : 11.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : The model was trained on tf 2.3, converted to onnx, and then converted to tensorRT engine.
Device for TRT engine builder: Jetson AGX Xavier

Hi @weissrael,

Could you please confirm are you using the same python script (github) in the description.
The logs does not exactly match. The logs does not have FP16 at all but the script on Github has FP16 enable.
You might not set workspace correctly. For example, user use build_engine(network, config) but set the workspace with builder.max_workspace_size.

From log all layers are reporting available scratch is 0. All TopK tactic want a scratch. We need to increase the workspace.

2021-03-14T09:10:21.5757269Z [TensorRT] VERBOSE: --------------- Timing Runner: StatefulPartitionedCall/functional_3/tiny_yolov3/tf_op_layer_ArgMax/ArgMax (TopK)
2021-03-14T09:10:21.5758730Z [TensorRT] VERBOSE: Tactic: 0 skipped. Scratch requested: 147840, available: 0
2021-03-14T09:10:21.5760298Z [TensorRT] VERBOSE: Tactic: 1 skipped. Scratch requested: 147840, available: 0
2021-03-14T09:10:21.5761899Z [TensorRT] VERBOSE: Tactic: 3 skipped. Scratch requested: 147840, available: 0
2021-03-14T09:10:21.5763332Z [TensorRT] VERBOSE: Tactic: 2 skipped. Scratch requested: 147840, available: 0
2021-03-14T09:10:21.5765403Z [TensorRT] VERBOSE: Fastest Tactic: -3360065831133338131 Time: 3.40282e+38
2021-03-14T09:10:21.5766987Z [TensorRT] ERROR: Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
2021-03-14T09:10:21.5769973Z [TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node StatefulPartitionedCall/functional_3/tiny_yolov3/tf_op_layer_ArgMax/ArgMax.)
2021-03-14T09:10:21.5771944Z [TensorRT] VERBOSE: Builder timing cache: created 74 entries, 26 hit(s)
2021-03-14T09:10:21.5775088Z [TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node StatefulPartitionedCall/functional_3/tiny_yolov3/tf_op_layer_ArgMax/ArgMax.)

Thank you.

Hi @spolisetty
I fixed the workspace adjustment to be applied to the config instead of the builder:

config.max_workspace_size = 1 << 30

The attached logs describes several exports of a TRT models- different precision / modes:

  1. export of both float32 model without DLA
  2. float16 model with DLA enabled.
    The error of the workspace-related + warning of “DLA Node compilation Failed” comes only for the float16 + DLA model.
    I have no clue why this happens, I’m following this thread meanwhile:
    Trtexec log problem and use DLA error on Jetson Xavier - #4 by disculus2012
    But if anyone can give an advice how to solve this for float16 model + DLA enabled, this will help. Preferably without needing to update jetpack etc. (I use Jetpack 4.4 for Jetson AGX)

Hi @weissrael,

You may need to try this on future releases of Jetpack. We have some fixes in TensorRT 7.2 latest version.
We recommend you to share model and relevant scripts for better debugging.

Thank you.