The name of input layer is “input” and output layer names are “output0, output1, output2” of my ONNX model.
I’ve utilized the sample codes in the TensorRT C++ packages.
the codes in the construction network phase,
// define builder(nvinfer1::createInferBuilder())
const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, sample::gLogger.getTRTLogger()));
auto parsed = parser->parseFromFile("path_to_onnx_file.onnx", static_cast<int>(sample::gLogger.getReportableSeverity()));
builder->setMaxBatchSize(mParams.batchSize);
config->setMaxWorkspaceSize(10_GiB);
std::unique_ptr<IInt8Calibrator> calibrator;
//=====================
// DO SOMETHING TO THE NETWORK
//=====================
// DO THE CALIBRATION TO INT8
the error messages are little different when I put set the input layer name.
in the DO SOMETHING TO THE NETWORK
,
nvinfer1::ITensor* input_layer = network->getInput(0);
const char* input_name = "input"; // as same to input_layer->getName();
nvinfer1::ITensor* input_int8_layer = network->addInput(input_name , DataType::kINT8, input_layer->getDimensions());
ASSERT(input_int8_layer != nullptr);
the the error message is here:
[02/15/2022-09:21:44] [I] Building and running a GPU inference engine for MY_MODEL
The model is parsed from path_to_onnx_file.onnx file
[02/15/2022-09:21:44] [I] [TRT] [MemUsageChange] Init CUDA: CPU +324, GPU +0, now: CPU 335, GPU 501 (MiB)
[02/15/2022-09:21:45] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 335 MiB, GPU 501 MiB
[02/15/2022-09:21:45] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 469 MiB, GPU 535 MiB
[02/15/2022-09:21:45] [I] [TRT] ----------------------------------------------------------------
[02/15/2022-09:21:45] [I] [TRT] Input filename: path_to_onnx_file.onnx
[02/15/2022-09:21:45] [I] [TRT] ONNX IR version: 0.0.7
[02/15/2022-09:21:45] [I] [TRT] Opset version: 9
[02/15/2022-09:21:45] [I] [TRT] Producer name: pytorch
[02/15/2022-09:21:45] [I] [TRT] Producer version: 1.10
[02/15/2022-09:21:45] [I] [TRT] Domain:
[02/15/2022-09:21:45] [I] [TRT] Model version: 0
[02/15/2022-09:21:45] [I] [TRT] Doc string:
[02/15/2022-09:21:45] [I] [TRT] ----------------------------------------------------------------
[02/15/2022-09:21:45] [E] [TRT] [network.cpp::addInput::1507] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addInput::1507, condition: inName != knownInput->getName())
[02/15/2022-09:21:45] [E] Assertion failure: input_int8_layer != nullptr
but if I use the network->addInput
to another name like below, the error is different.
(other remain codes are exactly same)
const char* input_name = "input_temp"; // as same to input_layer->getName();
the error code is below:
[02/15/2022-09:24:58] [I] Building and running a GPU inference engine for MY_MODEL
The model is parsed from path_to_onnx_file.onnx file
[02/15/2022-09:24:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +324, GPU +0, now: CPU 335, GPU 501 (MiB)
[02/15/2022-09:24:59] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 335 MiB, GPU 501 MiB
[02/15/2022-09:24:59] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 469 MiB, GPU 535 MiB
[02/15/2022-09:24:59] [I] [TRT] ----------------------------------------------------------------
[02/15/2022-09:24:59] [I] [TRT] Input filename: path_to_onnx_file.onnx
[02/15/2022-09:24:59] [I] [TRT] ONNX IR version: 0.0.7
[02/15/2022-09:24:59] [I] [TRT] Opset version: 9
[02/15/2022-09:24:59] [I] [TRT] Producer name: pytorch
[02/15/2022-09:24:59] [I] [TRT] Producer version: 1.10
[02/15/2022-09:24:59] [I] [TRT] Domain:
[02/15/2022-09:24:59] [I] [TRT] Model version: 0
[02/15/2022-09:24:59] [I] [TRT] Doc string:
[02/15/2022-09:24:59] [I] [TRT] ----------------------------------------------------------------
[02/15/2022-09:24:59] [I] Using Entropy Calibrator 2
[02/15/2022-09:24:59] [W] [TRT] Unused Input: input_temp
[02/15/2022-09:24:59] [W] [TRT] [RemoveDeadLayers] Input Tensor input_temp is unused or used only at compile-time, but is not being removed.
[02/15/2022-09:24:59] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/15/2022-09:24:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +484, GPU +206, now: CPU 977, GPU 749 (MiB)
[02/15/2022-09:24:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +401, GPU +204, now: CPU 1378, GPU 953 (MiB)
[02/15/2022-09:24:59] [I] [TRT] Timing cache disabled. Turning it on will improve builder speed.
[02/15/2022-09:25:01] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[02/15/2022-09:25:01] [I] [TRT] Total Host Persistent Memory: 111728
[02/15/2022-09:25:01] [I] [TRT] Total Device Persistent Memory: 0
[02/15/2022-09:25:01] [I] [TRT] Total Scratch Memory: 0
[02/15/2022-09:25:01] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 136 MiB
[02/15/2022-09:25:01] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 16.4134ms to assign 7 blocks to 141 nodes requiring 139345920 bytes.
[02/15/2022-09:25:01] [I] [TRT] Total Activation Memory: 139345920
[02/15/2022-09:25:01] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/15/2022-09:25:01] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1816, GPU 1133 (MiB)
[02/15/2022-09:25:01] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1816, GPU 1141 (MiB)
[02/15/2022-09:25:01] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/15/2022-09:25:01] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1816, GPU 1117 (MiB)
[02/15/2022-09:25:01] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1816, GPU 1125 (MiB)
[02/15/2022-09:25:01] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +132, now: CPU 0, GPU 141 (MiB)
[02/15/2022-09:25:01] [E] [TRT] 2: [calibrator.cpp::calibrateEngine::1132] Error Code 2: Internal Error (Assertion lastInput + 1 == nbInputs failed. )
[02/15/2022-09:25:01] [E] [TRT] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
The thing I want to do is add the INT8 input layer to my INT8 quantized serialized TensorRT model.
Because it reduces the input bandwidth to quarter.
(You may know that the default TensorRT INT8 quantization and serialization inputs the FP32 data.)
What could be the problem?