DLA_STANDALONE error in forceToUseNvmIO

Hi, I‘m trying to generate a DLA loadable alone for Resnet18.I get this error when I set config.engine_capability = trt.EngineCapability.DLA_STANDALONE:

[TRT] [E] 2: [nvmRegionOptimizer.cpp::forceToUseNvmIO::175] Error Code 2: Internal Error (Assertion std::all_of(a->consumers.begin(), a->consumers.end(), (Node* n) { return isDLA(n->backend); }) failed. )

Version:
Xavier NX
Jetpack 5.0.4

It should be JetPack 5.0.2, right?

Yes,JetPack 5.0.2. I get it wrong

Dear @Jhappy,
Have you tried using trtexec to get DLA engine?

Yes, I sucess to get a DLA trt plan with trtexec.
I alse tried polygraphy tool :

  • success : polygraphy convert -v resnet18_cut_sim.onnx --int8 -o temp.engine --use-dla
  • failed : polygraphy convert -v resnet18_cut_sim.onnx --int8 -o temp.engine --use-dla --engine-capability DLA_STANDALONE

Dear @SivaRamaKrishnaNV .
Do you get any idea of this error log?

Hi

We don’t have --engine-capability DLA_STANDALONE configure for v8.4.1 polygraphy.
Have you used the configuration on other platforms before?

$ polygraphy convert -h
usage: polygraphy convert [-h] [-v] [-q] [--silent] [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]] [--log-file LOG_FILE]
                          [--model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}] [--input-shapes INPUT_SHAPES [INPUT_SHAPES ...]] [--ckpt CKPT]
                          [--tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]] [--freeze-graph] [--opset OPSET] [--no-const-folding] [--shape-inference] [--external-data-dir EXTERNAL_DATA_DIR]
                          [--onnx-outputs ONNX_OUTPUTS [ONNX_OUTPUTS ...]] [--onnx-exclude-outputs ONNX_EXCLUDE_OUTPUTS [ONNX_EXCLUDE_OUTPUTS ...]] [--save-external-data [EXTERNAL_DATA_PATH]]
                          [--external-data-size-threshold EXTERNAL_DATA_SIZE_THRESHOLD] [--no-save-all-tensors-to-one-file] [--seed SEED] [--val-range VAL_RANGE [VAL_RANGE ...]] [--int-min INT_MIN]
                          [--int-max INT_MAX] [--float-min FLOAT_MIN] [--float-max FLOAT_MAX] [--iterations NUM] [--load-inputs LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...] | --data-loader-script
                          DATA_LOADER_SCRIPT] [--data-loader-func-name DATA_LOADER_FUNC_NAME] [--trt-min-shapes TRT_MIN_SHAPES [TRT_MIN_SHAPES ...]] [--trt-opt-shapes TRT_OPT_SHAPES [TRT_OPT_SHAPES ...]]
                          [--trt-max-shapes TRT_MAX_SHAPES [TRT_MAX_SHAPES ...]] [--tf32] [--fp16] [--int8] [--precision-constraints {prefer,obey,none} | --obey-precision-constraints | --strict-types]
                          [--sparse-weights] [--workspace BYTES] [--calibration-cache CALIBRATION_CACHE] [--calib-base-cls CALIBRATION_BASE_CLASS] [--quantile QUANTILE]
                          [--regression-cutoff REGRESSION_CUTOFF] [--timing-cache TIMING_CACHE] [--load-timing-cache LOAD_TIMING_CACHE] [--save-tactics SAVE_TACTICS | --load-tactics LOAD_TACTICS]
                          [--tactic-sources [TACTIC_SOURCES [TACTIC_SOURCES ...]]] [--trt-config-script TRT_CONFIG_SCRIPT] [--trt-config-func-name TRT_CONFIG_FUNC_NAME] [--trt-safety-restricted]
                          [--use-dla] [--allow-gpu-fallback] [--pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]]] [--plugins PLUGINS [PLUGINS ...]] [--explicit-precision]
                          [--trt-outputs TRT_OUTPUTS [TRT_OUTPUTS ...]] [--trt-exclude-outputs TRT_EXCLUDE_OUTPUTS [TRT_EXCLUDE_OUTPUTS ...]] [--trt-network-func-name TRT_NETWORK_FUNC_NAME]
                          [--save-timing-cache SAVE_TIMING_CACHE] -o OUTPUT [--convert-to {onnx,trt,onnx-like-trt-network}] [--fp-to-fp16]
                          model_file

Convert models to other formats.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Path to save the converted model
  --convert-to {onnx,trt,onnx-like-trt-network}
                        The format to attempt to convert the model to.'onnx-like-trt-network' is EXPERIMETNAL and converts a TensorRT network to a format usable for visualization. See
                        'OnnxLikeFromNetwork' for details.

Logging:
  Options related to logging and debug output

  -v, --verbose         Increase logging verbosity. Specify multiple times for higher verbosity
  -q, --quiet           Decrease logging verbosity. Specify multiple times for lower verbosity
  --silent              Disable all output
  --log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]
                        Format for log messages: {{'timestamp': Include timestamp, 'line-info': Include file and line number, 'no-colors': Disable colors}}
  --log-file LOG_FILE   Path to a file where Polygraphy logging output should be written. This will not include logging output from dependencies, like TensorRT or ONNX-Runtime.

Model:
  Options related to the model

  model_file            Path to the model
  --model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}
                        The type of the input model: {{'frozen': TensorFlow frozen graph, 'keras': Keras model, 'ckpt': TensorFlow checkpoint directory, 'onnx': ONNX model, 'engine': TensorRT engine,
                        'trt-network-script': A Python script that defines a `load_network` function that takes no arguments and returns a TensorRT Builder, Network, and optionally Parser, 'uff': UFF
                        file [deprecated], 'caffe': Caffe prototxt [deprecated]}}
  --input-shapes INPUT_SHAPES [INPUT_SHAPES ...], --inputs INPUT_SHAPES [INPUT_SHAPES ...]
                        Model input(s) and their shape(s). Used to determine shapes to use while generating input data for inference. Format: --input-shapes-shapes <name>:<shape>. For example: --input-
                        shapes-shapes image:[1,3,224,224] other_input:[10]

TensorFlow Model Loading:
  Options related to loading TensorFlow models.

  --ckpt CKPT           [EXPERIMENTAL] Name of the checkpoint to load. Required if the `checkpoint` file is missing. Should not include file extension (e.g. to load `model.meta` use `--ckpt=model`)
  --tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]
                        Name(s) of TensorFlow output(s). Using '--tf-outputs mark all' indicates that all tensors should be used as outputs
  --freeze-graph        [EXPERIMENTAL] Attempt to freeze the graph

TensorFlow-ONNX Model Conversion:
  Options related to converting TensorFlow models to ONNX.

  --opset OPSET         Opset to use when converting to ONNX
  --no-const-folding    [DEPRECATED] Do not fold constants in the TensorFlow graph prior to conversion

ONNX Shape Inference:
  Options related to ONNX shape inference.

  --shape-inference     Enable ONNX shape inference when loading the model

ONNX Model Loading:
  Options related to loading ONNX models.

  --external-data-dir EXTERNAL_DATA_DIR, --load-external-data EXTERNAL_DATA_DIR, --ext EXTERNAL_DATA_DIR
                        Path to a directory containing external data for the model. Generally, this is only required if the external data is not stored in the model directory.
  --onnx-outputs ONNX_OUTPUTS [ONNX_OUTPUTS ...]
                        Name(s) of ONNX tensor(s) to mark as output(s). Using the special value 'mark all' indicates that all tensors should be used as outputs
  --onnx-exclude-outputs ONNX_EXCLUDE_OUTPUTS [ONNX_EXCLUDE_OUTPUTS ...]
                        [EXPERIMENTAL] Name(s) of ONNX output(s) to unmark as outputs.
  --fp-to-fp16          Convert all floating point tensors in an ONNX model to 16-bit precision. This is *not* needed in order to use TensorRT's fp16 precision, but may be useful for other backends.
                        Requires onnxmltools.

ONNX Model Saving:
  Options related to saving ONNX models.

  --save-external-data [EXTERNAL_DATA_PATH], --external-data-path [EXTERNAL_DATA_PATH]
                        Whether to save weight data in external file(s). To use a non-default path, supply the desired path as an argument. This is always a relative path; external data is always written
                        to the same directory as the model.
  --external-data-size-threshold EXTERNAL_DATA_SIZE_THRESHOLD
                        The size threshold, in bytes, above which tensor data will be stored in the external file. Tensors smaller that this threshold will remain in the ONNX file. Optionally, use a `K`,
                        `M`, or `G` suffix to indicate KiB, MiB, or GiB respectively. For example, `--external-data-size-threshold=16M` is equivalent to `--external-data-size-threshold=16777216`. Has no
                        effect if `--save-external-data` is not set.
  --no-save-all-tensors-to-one-file
                        Do not save all tensors to a single file when saving external data. Has no effect if `--save-external-data` is not set

Data Loader:
  Options related to loading or generating input data for inference.

  --seed SEED           Seed to use for random inputs
  --val-range VAL_RANGE [VAL_RANGE ...]
                        Range of values to generate in the data loader. To specify per-input ranges, use the format: --val-range <input_name>:[min,max]. If no input name is provided, the range is used
                        for any inputs not explicitly specified. For example: --val-range [0,1] inp0:[2,50] inp1:[3.0,4.6]
  --int-min INT_MIN     [DEPRECATED: Use --val-range] Minimum integer value for random integer inputs
  --int-max INT_MAX     [DEPRECATED: Use --val-range] Maximum integer value for random integer inputs
  --float-min FLOAT_MIN
                        [DEPRECATED: Use --val-range] Minimum float value for random float inputs
  --float-max FLOAT_MAX
                        [DEPRECATED: Use --val-range] Maximum float value for random float inputs
  --iterations NUM, --iters NUM
                        Number of inference iterations for which the default data loader should supply data
  --load-inputs LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...], --load-input-data LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...]
                        [EXPERIMENTAL] Path(s) to load inputs. The file(s) should be a JSON-ified List[Dict[str, numpy.ndarray]], i.e. a list where each element is the feed_dict for a single iteration.
                        When this option is used, all other data loader arguments are ignored.
  --data-loader-script DATA_LOADER_SCRIPT
                        Path to a Python script that defines a function that loads input data. The function should take no arguments and return a generator or iterable that yields input data (Dict[str,
                        np.ndarray]). When this option is used, all other data loader arguments are ignored.
  --data-loader-func-name DATA_LOADER_FUNC_NAME
                        When using a data-loader-script, this specifies the name of the function that loads data. Defaults to `load_data`.

TensorRT Builder Configuration:
  Options related to creating the TensorRT BuilderConfig.

  --trt-min-shapes TRT_MIN_SHAPES [TRT_MIN_SHAPES ...]
                        The minimum shapes the optimization profile(s) will support. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format: --trt-min-
                        shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
  --trt-opt-shapes TRT_OPT_SHAPES [TRT_OPT_SHAPES ...]
                        The shapes for which the optimization profile(s) will be most performant. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format:
                        --trt-opt-shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
  --trt-max-shapes TRT_MAX_SHAPES [TRT_MAX_SHAPES ...]
                        The maximum shapes the optimization profile(s) will support. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format: --trt-max-
                        shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
  --tf32                Enable tf32 precision in TensorRT
  --fp16                Enable fp16 precision in TensorRT
  --int8                Enable int8 precision in TensorRT. If calibration is required but no calibration cache is provided, this option will cause TensorRT to run int8 calibration using the Polygraphy
                        data loader to provide calibration data.
  --precision-constraints {prefer,obey,none}
                        If set to `prefer`, TensorRT will restrict available tactics to layer precisions specified in the network unless no implementation exists with the preferred layer constraints, in
                        which case it will issue a warning and use the fastest available implementation. If set to `obey`, TensorRT will instead fail to build the network if no implementation exists with
                        the preferred layer constraints. Defaults to `none`
  --obey-precision-constraints
                        [DEPRECATED - use --precision-constraints] Enable enforcing precision constraints in TensorRT, forcing it to use tactics based on the layer precision set, even if another
                        precision is faster. Build fails if such an engine cannot be built.
  --strict-types        [DEPRECATED - use --precision-constraints] Enable preference for precision constraints and avoidance of I/O reformatting in TensorRT, and fall back to ignoring the request if such
                        an engine cannot be built.
  --sparse-weights      Enable optimizations for sparse weights in TensorRT
  --workspace BYTES     [DEPRECATED - use --pool-limit] Amount of memory, in bytes, to allocate for the TensorRT builder's workspace. Optionally, use a `K`, `M`, or `G` suffix to indicate KiB, MiB, or
                        GiB respectively. For example, `--workspace=16M` is equivalent to `--workspace=16777216`.
  --calibration-cache CALIBRATION_CACHE
                        Path to load/save a calibration cache. Used to store calibration scales to speed up the process of int8 calibration. If the provided path does not yet exist, int8 calibration
                        scales will be calculated and written to it during engine building. If the provided path does exist, it will be read and int8 calibration will be skipped during engine building.
  --calib-base-cls CALIBRATION_BASE_CLASS, --calibration-base-class CALIBRATION_BASE_CLASS
                        The name of the calibration base class to use. For example, 'IInt8MinMaxCalibrator'.
  --quantile QUANTILE   The quantile to use for IInt8LegacyCalibrator. Has no effect for other calibrator types.
  --regression-cutoff REGRESSION_CUTOFF
                        The regression cutoff to use for IInt8LegacyCalibrator. Has no effect for other calibrator types.
  --timing-cache TIMING_CACHE
                        [DEPRECATED - use --load-timing-cache/--save-timing-cache] Path to load/save tactic timing cache. Used to cache tactic timing information to speed up the engine building process.
                        Existing caches will be appended to with any new timing information gathered.
  --load-timing-cache LOAD_TIMING_CACHE
                        Path to load tactic timing cache. Used to cache tactic timing information to speed up the engine building process.
  --save-tactics SAVE_TACTICS
                        Path to save a Polygraphy tactic replay file. Details about tactics selected by TensorRT will be recorded and stored at this location as a JSON file.
  --load-tactics LOAD_TACTICS
                        Path to load a Polygraphy tactic replay file, such as one created by --save-tactics. The tactics specified in the file will be used to override TensorRT's default selections.
  --tactic-sources [TACTIC_SOURCES [TACTIC_SOURCES ...]]
                        Tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Values come from the names of the values in the
                        trt.TacticSource enum and are case-insensitive. If no arguments are provided, e.g. '--tactic-sources', then all tactic sources are disabled.
  --trt-config-script TRT_CONFIG_SCRIPT
                        Path to a Python script that defines a function that creates a TensorRT IBuilderConfig. The function should take a builder and network as parameters and return a TensorRT builder
                        configuration. When this option is specified, all other config arguments are ignored.
  --trt-config-func-name TRT_CONFIG_FUNC_NAME
                        When using a trt-config-script, this specifies the name of the function that creates the config. Defaults to `load_config`.
  --trt-safety-restricted
                        Enable safety scope checking in TensorRT
  --use-dla             [EXPERIMENTAL] Use DLA as the default device type
  --allow-gpu-fallback  [EXPERIMENTAL] Allow layers unsupported on the DLA to fall back to GPU. Has no effect if --dla is not set.
  --pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]], --memory-pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]]
                        Set memory pool limits. Memory pool names come from the names of values in the trt.MemoryPoolType enum and are case-insensitiveFormat: `--pool-limit <pool_name>:<pool_limit> ...`.
                        For example, `--pool-limit dla_local_dram:1e9 workspace:16777216`. Optionally, use a `K`, `M`, or `G` suffix to indicate KiB, MiB, or GiB respectively. For example, `--pool-limit
                        workspace:16M` is equivalent to `--pool-limit workspace:16777216`.

TensorRT Plugin Loading:
  Options related to loading TensorRT plugins.

  --plugins PLUGINS [PLUGINS ...]
                        Path(s) of plugin libraries to load

TensorRT Network Loading:
  Options related to loading TensorRT networks.

  --explicit-precision  [DEPRECATED] Enable explicit precision mode
  --trt-outputs TRT_OUTPUTS [TRT_OUTPUTS ...]
                        Name(s) of TensorRT output(s). Using '--trt-outputs mark all' indicates that all tensors should be used as outputs
  --trt-exclude-outputs TRT_EXCLUDE_OUTPUTS [TRT_EXCLUDE_OUTPUTS ...]
                        [EXPERIMENTAL] Name(s) of TensorRT output(s) to unmark as outputs.
  --trt-network-func-name TRT_NETWORK_FUNC_NAME
                        When using a trt-network-script instead of other model types, this specifies the name of the function that loads the network. Defaults to `load_network`.

TensorRT Engine:
  Options related to loading TensorRT engines.

  --save-timing-cache SAVE_TIMING_CACHE
                        Path to save tactic timing cache if building an engine. Existing caches will be appended to with any new timing information gathered.

TensorRT Engine Saving:
  Options related to saving TensorRT engines.

Thanks.

Thanks for reply.
This is my polygraphy output. There is –engine-capability:

usage: polygraphy convert [-h] [-v] [-q] [--verbosity VERBOSITY [VERBOSITY ...]] [--silent]
                          [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]] [--log-file LOG_FILE]
                          [--model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}] [--input-shapes INPUT_SHAPES [INPUT_SHAPES ...]]
                          [--ckpt CKPT] [--tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]] [--freeze-graph] [--opset OPSET] [--shape-inference]
                          [--no-onnxruntime-shape-inference] [--external-data-dir EXTERNAL_DATA_DIR] [--ignore-external-data]
                          [--onnx-outputs ONNX_OUTPUTS [ONNX_OUTPUTS ...]] [--onnx-exclude-outputs ONNX_EXCLUDE_OUTPUTS [ONNX_EXCLUDE_OUTPUTS ...]]
                          [--save-external-data [EXTERNAL_DATA_PATH]] [--external-data-size-threshold EXTERNAL_DATA_SIZE_THRESHOLD]
                          [--no-save-all-tensors-to-one-file] [--seed SEED] [--val-range VAL_RANGE [VAL_RANGE ...]] [--int-min INT_MIN] [--int-max INT_MAX]
                          [--float-min FLOAT_MIN] [--float-max FLOAT_MAX] [--iterations NUM] [--load-inputs LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...] |
                          --data-loader-script DATA_LOADER_SCRIPT] [--data-loader-func-name DATA_LOADER_FUNC_NAME]
                          [--trt-min-shapes TRT_MIN_SHAPES [TRT_MIN_SHAPES ...]] [--trt-opt-shapes TRT_OPT_SHAPES [TRT_OPT_SHAPES ...]]
                          [--trt-max-shapes TRT_MAX_SHAPES [TRT_MAX_SHAPES ...]] [--tf32] [--fp16] [--int8]
                          [--precision-constraints {prefer,obey,none} | --strict-types] [--sparse-weights] [--workspace BYTES]
                          [--calibration-cache CALIBRATION_CACHE] [--calib-base-cls CALIBRATION_BASE_CLASS] [--quantile QUANTILE]
                          [--regression-cutoff REGRESSION_CUTOFF] [--timing-cache TIMING_CACHE] [--load-timing-cache LOAD_TIMING_CACHE]
                          [--save-tactics SAVE_TACTICS | --load-tactics LOAD_TACTICS] [--tactic-sources [TACTIC_SOURCES [TACTIC_SOURCES ...]]]
                          [--trt-config-script TRT_CONFIG_SCRIPT] [--trt-config-func-name TRT_CONFIG_FUNC_NAME] [--trt-safety-restricted] [--refittable]
                          [--use-dla] [--allow-gpu-fallback] [--pool-limit MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]]
                          [--preview-features PREVIEW_FEATURES [PREVIEW_FEATURES ...]] [--engine-capability ENGINE_CAPABILITY] [--direct-io]
                          [--plugins PLUGINS [PLUGINS ...]] [--trt-outputs TRT_OUTPUTS [TRT_OUTPUTS ...]]
                          [--trt-exclude-outputs TRT_EXCLUDE_OUTPUTS [TRT_EXCLUDE_OUTPUTS ...]] [--layer-precisions LAYER_PRECISIONS [LAYER_PRECISIONS ...]]
                          [--tensor-dtypes TENSOR_DTYPES [TENSOR_DTYPES ...]] [--tensor-formats TENSOR_FORMATS [TENSOR_FORMATS ...]]
                          [--trt-network-func-name TRT_NETWORK_FUNC_NAME] [--save-timing-cache SAVE_TIMING_CACHE] -o OUTPUT
                          [--convert-to {onnx,trt,onnx-like-trt-network}] [--fp-to-fp16]
                          model_file

The output version is:

$ polygraphy --version
Polygraphy | Version: 0.42.2

Do you mean tensorrt v8.4.1 have no --engine-capability DLA_STANDALONE` configure ?

Hi,

Which TensorRT OSS branch do you use?
We test the 8.4.1 for compatibility of JetPack 5.0.2 TensorRT.

Thanks.

Hi~
I clone TensorRT OSS from master branch at commit 7d2a70a18bb0e5…
Thanks~

Hi,

Since JetPack includes TensorRT 8.4, please use the same branches instead.
For a newer version(ex. v8.5), please wait for our future release.

Thanks.

Thanks .I will try that.
In the other hand ,I try to use python api to convert DLA_STANDALONE by

config.engine_capability = trt.EngineCapability.DLA_STANDALONE

I meet the same error.
Do you have any example file for that?I cannot find in the document.

It seems you want to generate only DLA compatible model(no layer should be offloaded to GPU). The Error indicate, your model has non DLA compatible layers. Could you share the used trtexec command and trtexec output log for confirmation?

Sorry for my late reply. These days I cannot access the jetson at company.So I can not upload the log. In my model I cut all layer in resnet18 but the 1st convolution.

Yes I want to generate only DLA compatible model(no layer should be offloaded to GPU).I succeed use trtexec to get DLA trt plan(with allow fallback to gpu).
But I failed using polygraphy and PYTHON script togenerate only DLA compatible model. As @AastaLLL said maybe it’s caused by my version mismatch.I will try that after I can access the jetson box.