Tlt-converter out of memory

ms32035 · March 4, 2021, 10:02am

I am running the latest tlt-converter cuda102-trt71-jp45 with -w parameter on a Jetson Xaxier NX and I noticed that at some higher memory settings, like 3-4GB or more it ends with a memory error showing that available memory is 0.

[ERROR] Internal error: plugin node BatchedNMS requires 1029376 bytes of scratch space, but only 0 is available

The issue does not happen with the default value of 1<<30. In the past I was encountering the same problem when building models in TensorRT, when the input parameter to the program was above the max value for a 32bit integer and as a result setMaxWorkspaceSize was getting 0. Can you please verify if 64-bit integers are correctly handled in the input parameters of tlt-converter?

Morganh · March 4, 2021, 3:48pm

Can you share your command and full log when you run the tlt-coverter?

ms32035 · March 4, 2021, 4:16pm

./tlt-converter -t fp16 -d 3,288,512 -k key model.etlt -o BatchedNMS -w 17179869184 -m 1

[INFO]
[INFO] --------------- Layers running on DLA:
[INFO]
[INFO] --------------- Layers running on GPU:
[INFO] conv1/convolution + conv1_mish/Relu6, maxpool_1/MaxPool, <<<TRUNCATED>>> , BatchedNMS,
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
[ERROR] Internal error: plugin node BatchedNMSrequires 1029376 bytes of scratch space, but only 0 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/cudnnBuilder2.cpp (1118) - OutOfMemory Error in checkPluginScratchSize: 0
[ERROR] Unable to create engine
Segmentation fault (core dumped)

this was set for 16GB and ran on AGX with 32 GB, but same issues with lower -w

Morganh · March 4, 2021, 4:34pm

Can you try “-w 100000000” or “-w 1000000000” ?
Reference: Tutorial Spec Error: Message type "RegularizerConfig" has no field named "reg_type" - #2 by Morganh
TLT Converter Fails - #2 by Morganh

ms32035 · March 4, 2021, 4:47pm

1000000000 works but that’s not even 1GB, less than the default (1<<30 = 1073741824) and still within 32bit int range. I am still getting the warning

[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output

In the past we observed that TensorRT models at higher memory (4 or 8GB) were generating slightly different results at inference

Morganh · March 4, 2021, 4:51pm

Please ignore the [INFO] log. Please check if the new etlt file is available in your directory.

ms32035 · March 4, 2021, 4:57pm

Yes, new engine is there. That it is going to work at 1GB I already knew, and that’s not the issue here. The issue is, that models generated at various workspace memory settings can be different from each other, and the tool does not work correctly with larger workspaces.

ms32035 · March 8, 2021, 11:25pm

@Morganh has the issue been verified as I asked?

Morganh · March 9, 2021, 1:57am

The -w cannot set to very large. Otherwise, there is no memory space for other application.
I will verify your case later.

Morganh · March 16, 2021, 3:27am

The default -w value is 1<<30 (i.e, 1GB) . It works for your case.
End user cannot set to a much higher value.

root@862a17075444:/workspace# tlt-converter -h
usage: tlt-converter [-h] [-v] [-e ENGINE_FILE_PATH]
[-k ENCODE_KEY] [-c CACHE_FILE]
[-o OUTPUTS] [-d INPUT_DIMENSIONS]
[-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
[-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
[-i INPUT_ORDER] [-s] [-u DLA_CORE]
input_file

Generate TensorRT engine from exported model

positional arguments:
input_file Input file (.etlt exported model).

required flag arguments:
-d comma separated list of input dimensions(not required for TLT 3.0 new models).
-k model encoding key.

optional flag arguments:
-b calibration batch size (default 8).
-c calibration cache file (default cal.bin).
-e file the engine is saved to (default saved.engine).
-i input dimension ordering – nchw, nhwc, nc (default nchw).
-m maximum TensorRT engine batch size (default 16). If meet with out-of-memory issue, please decrease the batch size accordingly.
-o comma separated list of output node names (default none).
-p comma separated list of optimization profile shapes in the format <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has the format: xxx. Can be specified multiple times if there are multiple input tensors for the model. This argument is only useful in dynamic shape case.
-s TensorRT strict_type_constraints flag for INT8 mode(default false).
-t TensorRT data type – fp32, fp16, int8 (default fp32).
-u Use DLA core N for layers that support DLA(default = -1, which means no DLA core will be utilized for inference. Note that it’ll always allow GPU fallback).
-w maximum workspace size of TensorRT engine (default 1<<30). If meet with out-of-memory issue, please increase the workspace size accordingly.

Topic		Replies	Views
TLT Converter Fails TAO Toolkit	3	764	October 12, 2021
Error - Some tactics do not have sufficient workspace memory to run TAO Toolkit	4	2791	October 12, 2021
when i Creating a Lite Engine From a TensorFlow Model, there occurs an error, what does it mean Jetson TX2	4	803	October 18, 2021
Tlt-converter workspace-size TAO Toolkit tensorrt , cuda , ubuntu	3	648	July 6, 2022
An error occurred while converting .etlt to .engine TAO Toolkit	2	364	October 12, 2021
OOM of conv layer TensorRT	4	654	October 12, 2021
TensorRT 6.0 Float32 engine will use MaxWorkspace TensorRT	1	931	January 16, 2020
AssertionError: Max workspace size for TensorRT inference should be positive, got 0 TAO Toolkit	5	1174	October 4, 2021
TAO converter happened some bugs TAO Toolkit	8	528	September 30, 2022
Why dla need so much workspace size? TensorRT tensorrt , jetson-inference	2	820	March 19, 2021

Tlt-converter out of memory

Related topics