Hi
We don’t have --engine-capability DLA_STANDALONE
configure for v8.4.1 polygraphy.
Have you used the configuration on other platforms before?
$ polygraphy convert -h
usage: polygraphy convert [-h] [-v] [-q] [--silent] [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]] [--log-file LOG_FILE]
[--model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}] [--input-shapes INPUT_SHAPES [INPUT_SHAPES ...]] [--ckpt CKPT]
[--tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]] [--freeze-graph] [--opset OPSET] [--no-const-folding] [--shape-inference] [--external-data-dir EXTERNAL_DATA_DIR]
[--onnx-outputs ONNX_OUTPUTS [ONNX_OUTPUTS ...]] [--onnx-exclude-outputs ONNX_EXCLUDE_OUTPUTS [ONNX_EXCLUDE_OUTPUTS ...]] [--save-external-data [EXTERNAL_DATA_PATH]]
[--external-data-size-threshold EXTERNAL_DATA_SIZE_THRESHOLD] [--no-save-all-tensors-to-one-file] [--seed SEED] [--val-range VAL_RANGE [VAL_RANGE ...]] [--int-min INT_MIN]
[--int-max INT_MAX] [--float-min FLOAT_MIN] [--float-max FLOAT_MAX] [--iterations NUM] [--load-inputs LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...] | --data-loader-script
DATA_LOADER_SCRIPT] [--data-loader-func-name DATA_LOADER_FUNC_NAME] [--trt-min-shapes TRT_MIN_SHAPES [TRT_MIN_SHAPES ...]] [--trt-opt-shapes TRT_OPT_SHAPES [TRT_OPT_SHAPES ...]]
[--trt-max-shapes TRT_MAX_SHAPES [TRT_MAX_SHAPES ...]] [--tf32] [--fp16] [--int8] [--precision-constraints {prefer,obey,none} | --obey-precision-constraints | --strict-types]
[--sparse-weights] [--workspace BYTES] [--calibration-cache CALIBRATION_CACHE] [--calib-base-cls CALIBRATION_BASE_CLASS] [--quantile QUANTILE]
[--regression-cutoff REGRESSION_CUTOFF] [--timing-cache TIMING_CACHE] [--load-timing-cache LOAD_TIMING_CACHE] [--save-tactics SAVE_TACTICS | --load-tactics LOAD_TACTICS]
[--tactic-sources [TACTIC_SOURCES [TACTIC_SOURCES ...]]] [--trt-config-script TRT_CONFIG_SCRIPT] [--trt-config-func-name TRT_CONFIG_FUNC_NAME] [--trt-safety-restricted]
[--use-dla] [--allow-gpu-fallback] [--pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]]] [--plugins PLUGINS [PLUGINS ...]] [--explicit-precision]
[--trt-outputs TRT_OUTPUTS [TRT_OUTPUTS ...]] [--trt-exclude-outputs TRT_EXCLUDE_OUTPUTS [TRT_EXCLUDE_OUTPUTS ...]] [--trt-network-func-name TRT_NETWORK_FUNC_NAME]
[--save-timing-cache SAVE_TIMING_CACHE] -o OUTPUT [--convert-to {onnx,trt,onnx-like-trt-network}] [--fp-to-fp16]
model_file
Convert models to other formats.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Path to save the converted model
--convert-to {onnx,trt,onnx-like-trt-network}
The format to attempt to convert the model to.'onnx-like-trt-network' is EXPERIMETNAL and converts a TensorRT network to a format usable for visualization. See
'OnnxLikeFromNetwork' for details.
Logging:
Options related to logging and debug output
-v, --verbose Increase logging verbosity. Specify multiple times for higher verbosity
-q, --quiet Decrease logging verbosity. Specify multiple times for lower verbosity
--silent Disable all output
--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]
Format for log messages: {{'timestamp': Include timestamp, 'line-info': Include file and line number, 'no-colors': Disable colors}}
--log-file LOG_FILE Path to a file where Polygraphy logging output should be written. This will not include logging output from dependencies, like TensorRT or ONNX-Runtime.
Model:
Options related to the model
model_file Path to the model
--model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}
The type of the input model: {{'frozen': TensorFlow frozen graph, 'keras': Keras model, 'ckpt': TensorFlow checkpoint directory, 'onnx': ONNX model, 'engine': TensorRT engine,
'trt-network-script': A Python script that defines a `load_network` function that takes no arguments and returns a TensorRT Builder, Network, and optionally Parser, 'uff': UFF
file [deprecated], 'caffe': Caffe prototxt [deprecated]}}
--input-shapes INPUT_SHAPES [INPUT_SHAPES ...], --inputs INPUT_SHAPES [INPUT_SHAPES ...]
Model input(s) and their shape(s). Used to determine shapes to use while generating input data for inference. Format: --input-shapes-shapes <name>:<shape>. For example: --input-
shapes-shapes image:[1,3,224,224] other_input:[10]
TensorFlow Model Loading:
Options related to loading TensorFlow models.
--ckpt CKPT [EXPERIMENTAL] Name of the checkpoint to load. Required if the `checkpoint` file is missing. Should not include file extension (e.g. to load `model.meta` use `--ckpt=model`)
--tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]
Name(s) of TensorFlow output(s). Using '--tf-outputs mark all' indicates that all tensors should be used as outputs
--freeze-graph [EXPERIMENTAL] Attempt to freeze the graph
TensorFlow-ONNX Model Conversion:
Options related to converting TensorFlow models to ONNX.
--opset OPSET Opset to use when converting to ONNX
--no-const-folding [DEPRECATED] Do not fold constants in the TensorFlow graph prior to conversion
ONNX Shape Inference:
Options related to ONNX shape inference.
--shape-inference Enable ONNX shape inference when loading the model
ONNX Model Loading:
Options related to loading ONNX models.
--external-data-dir EXTERNAL_DATA_DIR, --load-external-data EXTERNAL_DATA_DIR, --ext EXTERNAL_DATA_DIR
Path to a directory containing external data for the model. Generally, this is only required if the external data is not stored in the model directory.
--onnx-outputs ONNX_OUTPUTS [ONNX_OUTPUTS ...]
Name(s) of ONNX tensor(s) to mark as output(s). Using the special value 'mark all' indicates that all tensors should be used as outputs
--onnx-exclude-outputs ONNX_EXCLUDE_OUTPUTS [ONNX_EXCLUDE_OUTPUTS ...]
[EXPERIMENTAL] Name(s) of ONNX output(s) to unmark as outputs.
--fp-to-fp16 Convert all floating point tensors in an ONNX model to 16-bit precision. This is *not* needed in order to use TensorRT's fp16 precision, but may be useful for other backends.
Requires onnxmltools.
ONNX Model Saving:
Options related to saving ONNX models.
--save-external-data [EXTERNAL_DATA_PATH], --external-data-path [EXTERNAL_DATA_PATH]
Whether to save weight data in external file(s). To use a non-default path, supply the desired path as an argument. This is always a relative path; external data is always written
to the same directory as the model.
--external-data-size-threshold EXTERNAL_DATA_SIZE_THRESHOLD
The size threshold, in bytes, above which tensor data will be stored in the external file. Tensors smaller that this threshold will remain in the ONNX file. Optionally, use a `K`,
`M`, or `G` suffix to indicate KiB, MiB, or GiB respectively. For example, `--external-data-size-threshold=16M` is equivalent to `--external-data-size-threshold=16777216`. Has no
effect if `--save-external-data` is not set.
--no-save-all-tensors-to-one-file
Do not save all tensors to a single file when saving external data. Has no effect if `--save-external-data` is not set
Data Loader:
Options related to loading or generating input data for inference.
--seed SEED Seed to use for random inputs
--val-range VAL_RANGE [VAL_RANGE ...]
Range of values to generate in the data loader. To specify per-input ranges, use the format: --val-range <input_name>:[min,max]. If no input name is provided, the range is used
for any inputs not explicitly specified. For example: --val-range [0,1] inp0:[2,50] inp1:[3.0,4.6]
--int-min INT_MIN [DEPRECATED: Use --val-range] Minimum integer value for random integer inputs
--int-max INT_MAX [DEPRECATED: Use --val-range] Maximum integer value for random integer inputs
--float-min FLOAT_MIN
[DEPRECATED: Use --val-range] Minimum float value for random float inputs
--float-max FLOAT_MAX
[DEPRECATED: Use --val-range] Maximum float value for random float inputs
--iterations NUM, --iters NUM
Number of inference iterations for which the default data loader should supply data
--load-inputs LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...], --load-input-data LOAD_INPUTS_PATHS [LOAD_INPUTS_PATHS ...]
[EXPERIMENTAL] Path(s) to load inputs. The file(s) should be a JSON-ified List[Dict[str, numpy.ndarray]], i.e. a list where each element is the feed_dict for a single iteration.
When this option is used, all other data loader arguments are ignored.
--data-loader-script DATA_LOADER_SCRIPT
Path to a Python script that defines a function that loads input data. The function should take no arguments and return a generator or iterable that yields input data (Dict[str,
np.ndarray]). When this option is used, all other data loader arguments are ignored.
--data-loader-func-name DATA_LOADER_FUNC_NAME
When using a data-loader-script, this specifies the name of the function that loads data. Defaults to `load_data`.
TensorRT Builder Configuration:
Options related to creating the TensorRT BuilderConfig.
--trt-min-shapes TRT_MIN_SHAPES [TRT_MIN_SHAPES ...]
The minimum shapes the optimization profile(s) will support. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format: --trt-min-
shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
--trt-opt-shapes TRT_OPT_SHAPES [TRT_OPT_SHAPES ...]
The shapes for which the optimization profile(s) will be most performant. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format:
--trt-opt-shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
--trt-max-shapes TRT_MAX_SHAPES [TRT_MAX_SHAPES ...]
The maximum shapes the optimization profile(s) will support. Specify this option once for each profile. If not provided, inference-time input shapes are used. Format: --trt-max-
shapes <input0>:[D0,D1,..,DN] .. <inputN>:[D0,D1,..,DN]
--tf32 Enable tf32 precision in TensorRT
--fp16 Enable fp16 precision in TensorRT
--int8 Enable int8 precision in TensorRT. If calibration is required but no calibration cache is provided, this option will cause TensorRT to run int8 calibration using the Polygraphy
data loader to provide calibration data.
--precision-constraints {prefer,obey,none}
If set to `prefer`, TensorRT will restrict available tactics to layer precisions specified in the network unless no implementation exists with the preferred layer constraints, in
which case it will issue a warning and use the fastest available implementation. If set to `obey`, TensorRT will instead fail to build the network if no implementation exists with
the preferred layer constraints. Defaults to `none`
--obey-precision-constraints
[DEPRECATED - use --precision-constraints] Enable enforcing precision constraints in TensorRT, forcing it to use tactics based on the layer precision set, even if another
precision is faster. Build fails if such an engine cannot be built.
--strict-types [DEPRECATED - use --precision-constraints] Enable preference for precision constraints and avoidance of I/O reformatting in TensorRT, and fall back to ignoring the request if such
an engine cannot be built.
--sparse-weights Enable optimizations for sparse weights in TensorRT
--workspace BYTES [DEPRECATED - use --pool-limit] Amount of memory, in bytes, to allocate for the TensorRT builder's workspace. Optionally, use a `K`, `M`, or `G` suffix to indicate KiB, MiB, or
GiB respectively. For example, `--workspace=16M` is equivalent to `--workspace=16777216`.
--calibration-cache CALIBRATION_CACHE
Path to load/save a calibration cache. Used to store calibration scales to speed up the process of int8 calibration. If the provided path does not yet exist, int8 calibration
scales will be calculated and written to it during engine building. If the provided path does exist, it will be read and int8 calibration will be skipped during engine building.
--calib-base-cls CALIBRATION_BASE_CLASS, --calibration-base-class CALIBRATION_BASE_CLASS
The name of the calibration base class to use. For example, 'IInt8MinMaxCalibrator'.
--quantile QUANTILE The quantile to use for IInt8LegacyCalibrator. Has no effect for other calibrator types.
--regression-cutoff REGRESSION_CUTOFF
The regression cutoff to use for IInt8LegacyCalibrator. Has no effect for other calibrator types.
--timing-cache TIMING_CACHE
[DEPRECATED - use --load-timing-cache/--save-timing-cache] Path to load/save tactic timing cache. Used to cache tactic timing information to speed up the engine building process.
Existing caches will be appended to with any new timing information gathered.
--load-timing-cache LOAD_TIMING_CACHE
Path to load tactic timing cache. Used to cache tactic timing information to speed up the engine building process.
--save-tactics SAVE_TACTICS
Path to save a Polygraphy tactic replay file. Details about tactics selected by TensorRT will be recorded and stored at this location as a JSON file.
--load-tactics LOAD_TACTICS
Path to load a Polygraphy tactic replay file, such as one created by --save-tactics. The tactics specified in the file will be used to override TensorRT's default selections.
--tactic-sources [TACTIC_SOURCES [TACTIC_SOURCES ...]]
Tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Values come from the names of the values in the
trt.TacticSource enum and are case-insensitive. If no arguments are provided, e.g. '--tactic-sources', then all tactic sources are disabled.
--trt-config-script TRT_CONFIG_SCRIPT
Path to a Python script that defines a function that creates a TensorRT IBuilderConfig. The function should take a builder and network as parameters and return a TensorRT builder
configuration. When this option is specified, all other config arguments are ignored.
--trt-config-func-name TRT_CONFIG_FUNC_NAME
When using a trt-config-script, this specifies the name of the function that creates the config. Defaults to `load_config`.
--trt-safety-restricted
Enable safety scope checking in TensorRT
--use-dla [EXPERIMENTAL] Use DLA as the default device type
--allow-gpu-fallback [EXPERIMENTAL] Allow layers unsupported on the DLA to fall back to GPU. Has no effect if --dla is not set.
--pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]], --memory-pool-limit [MEMORY_POOL_LIMIT [MEMORY_POOL_LIMIT ...]]
Set memory pool limits. Memory pool names come from the names of values in the trt.MemoryPoolType enum and are case-insensitiveFormat: `--pool-limit <pool_name>:<pool_limit> ...`.
For example, `--pool-limit dla_local_dram:1e9 workspace:16777216`. Optionally, use a `K`, `M`, or `G` suffix to indicate KiB, MiB, or GiB respectively. For example, `--pool-limit
workspace:16M` is equivalent to `--pool-limit workspace:16777216`.
TensorRT Plugin Loading:
Options related to loading TensorRT plugins.
--plugins PLUGINS [PLUGINS ...]
Path(s) of plugin libraries to load
TensorRT Network Loading:
Options related to loading TensorRT networks.
--explicit-precision [DEPRECATED] Enable explicit precision mode
--trt-outputs TRT_OUTPUTS [TRT_OUTPUTS ...]
Name(s) of TensorRT output(s). Using '--trt-outputs mark all' indicates that all tensors should be used as outputs
--trt-exclude-outputs TRT_EXCLUDE_OUTPUTS [TRT_EXCLUDE_OUTPUTS ...]
[EXPERIMENTAL] Name(s) of TensorRT output(s) to unmark as outputs.
--trt-network-func-name TRT_NETWORK_FUNC_NAME
When using a trt-network-script instead of other model types, this specifies the name of the function that loads the network. Defaults to `load_network`.
TensorRT Engine:
Options related to loading TensorRT engines.
--save-timing-cache SAVE_TIMING_CACHE
Path to save tactic timing cache if building an engine. Existing caches will be appended to with any new timing information gathered.
TensorRT Engine Saving:
Options related to saving TensorRT engines.
Thanks.