Dynamic Shapes

Description

Sometimes I get models from others on my team which I need to convert to onnx and then run inference on to measure some performance metrics. I notice that sometimes the models have an dynamic shape on the input tensor but I run my metrics on fixed shapes. For example, I’ve received models with tensor shape (?, C, H, W)

In those cases, C, H, and W are fixed but the first dimension is not defined in the onnx file model (though I already know the fixed value I want to run inference with). I’ve noticed that if I want to use these with an enhanced optimization engine like TensorRT, this can cause a problem where in order to use the model I have to perform some intermediary steps where I have both a preprocessor an prediction engine in order to reshape and run inference.

Correct me if I am wrong, but I believe this could add overhead that would be desired to be avoided. If the inference fixed shape is known, is it possible to build the engine in such a way that it can be serialized, saved, and loaded afterward for inference without having to do reshaping?

Environment

TensorRT Version: 7
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Windows 10 64-bit

Please refer to below link for working with dynamic shapes:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-700/tensorrt-developer-guide/index.html#work_dynamic_shapes

You can fine tune model using optimization profiles to specific input dim range
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-700/tensorrt-developer-guide/index.html#opt_profiles

Thanks

@SunilJB So I tried to follow the example provided by TensorRT, but I seem to have run into a problem. The model I am trying to work with is: https://github.com/onnx/models/tree/master/vision/super_resolution/sub_pixel_cnn_2016

The code I have implemented:

samplesCommon::OnnxSampleParams initializeSampleParams()
{
    samplesCommon::OnnxSampleParams params;
    params.dataDirs.push_back("data/mnist/");
    params.dataDirs.push_back("data/samples/mnist/");
    /*params.onnxFileName = "mnist.onnx";
    params.inputTensorNames.push_back("Input3");
    params.outputTensorNames.push_back("Plus214_Output_0");*/
    params.onnxFileName = "super_resolution.onnx";
    params.inputTensorNames.push_back("input");
    params.outputTensorNames.push_back("output");
    params.batchSize = 1;
    params.int8 = false;
    params.fp16 = false;
    return params;
  }

void SampleDynamicReshape::build()
{
   auto builder = makeUnique(nvinfer1::createInferBuilder(gLogger.getTRTLogger()));

  // This function will also set mPredictionInputDims and mPredictionOutputDims,
   // so it needs to be called before building the preprocessor.
   buildPredictionEngine(builder);
   buildPreprocessorEngine(builder);
}

void SampleDynamicReshape::buildPreprocessorEngine(const SampleUniquePtr<nvinfer1::IBuilder>& 
builder)
{
// Create the preprocessor engine using a network that supports full dimensions (createNetworkV2).
auto preprocessorNetwork = makeUnique(
    builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH)));

// Reshape a dynamically shaped input to the size expected by the model, (1, 1, 28, 28).
//auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{ 1, 1, -1, -1 });
auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{ -1, 1, 1, 1 });
auto resizeLayer = preprocessorNetwork->addResize(*input);
resizeLayer->setOutputDimensions(mPredictionInputDims);
preprocessorNetwork->markOutput(*resizeLayer->getOutput(0));

// Finally, configure and build the preprocessor engine.
auto preprocessorConfig = makeUnique(builder->createBuilderConfig());

// Create an optimization profile so that we can specify a range of input dimensions.
auto profile = builder->createOptimizationProfile();

// This profile will be valid for all images whose size falls in the range of [(1, 1, 1, 1), (1, 1, 224, 224)]
// but TensorRT will optimize for (1, 1, 28, 28)
/*profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{ 1, 1, 1, 1});
profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{ 1, 1, 28, 28 });
profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{ 1, 1, 56, 56 });*/
profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{ 1, 1, 1, 1 });
profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{ 1, 1, 224, 224 });
profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{ 1, 1, 224, 224});
preprocessorConfig->addOptimizationProfile(profile);
mPreprocessorEngine = makeUnique(builder->buildEngineWithConfig(*preprocessorNetwork, *preprocessorConfig));
gLogInfo << "Profile dimensions in preprocessor engine:\n";
gLogInfo << "    Minimum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kMIN) << '\n';
gLogInfo << "    Optimum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kOPT) << '\n';
gLogInfo << "    Maximum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kMAX)
    << std::endl;

}

void SampleDynamicReshape::buildPredictionEngine(const SampleUniquePtr<nvinfer1::IBuilder>& 
builder)
{
    // Create a network using the parser.
    const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto network = makeUnique(builder->createNetworkV2(explicitBatch));
    auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());
    bool parsingSuccess = parser->parseFromFile(
    locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity()));
if (!parsingSuccess)
{
    throw std::runtime_error{ "Failed to parse model" };
}

/*// Attach a softmax layer to the end of the network.
auto softmax = network->addSoftMax(*network->getOutput(0));
// Set softmax axis to 1 since network output has shape [1, 10] in full dims mode
softmax->setAxes(1 << 1);
network->unmarkOutput(*network->getOutput(0));
network->markOutput(*softmax->getOutput(0));*/

// Get information about the inputs/outputs directly from the model.
mPredictionInputDims = network->getInput(0)->getDimensions();
mPredictionOutputDims = network->getOutput(0)->getDimensions();

// Create a builder config
auto config = makeUnique(builder->createBuilderConfig());
config->setMaxWorkspaceSize(16_MiB);
if (mParams.fp16)
{
    config->setFlag(BuilderFlag::kFP16);
}
if (mParams.int8)
{
    config->setFlag(BuilderFlag::kINT8);
    samplesCommon::setAllTensorScales(network.get(), 127.0f, 127.0f);
}
// Build the prediciton engine.
mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config));

}

So the model seems to load and input/output is read, but I get an error on the following line:
mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config));

[E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[E] [TRT] Network validation failed.

Seems that the right input shape is detected:

Hi,

The input dimension of the model is "input: [ batch_size,1,224,224]
Since only batch size is only dynamic element, if you try changing other element it will fail.

trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x1x1 --optShapes=input:1x1x28x28 --maxShapes=input:1x1x56x56

[06/23/2020-04:58:53] [E] [TRT] input: for dimension number 2 in profile 0 does not match network definition (got min=1, opt=28, max=56), expected min=opt=max=224).
[06/23/2020-04:58:53] [E] [TRT] Network validation failed.
[06/23/2020-04:58:53] [E] Engine creation failed
[06/23/2020-04:58:53] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x1x1 --optShapes=input:1x1x28x28 --maxShapes=input:1x1x56x56

You have to use opt profile something like this:
trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x224x224 --optShapes=input:16x1x224x224 --maxShapes=input:32x1x224x224
[06/23/2020-05:04:32] [I] GPU Compute
[06/23/2020-05:04:32] [I] min: 16.8679 ms
[06/23/2020-05:04:32] [I] max: 17.492 ms
[06/23/2020-05:04:32] [I] mean: 17.2015 ms
[06/23/2020-05:04:32] [I] median: 17.2153 ms
[06/23/2020-05:04:32] [I] percentile: 17.4838 ms at 99%
[06/23/2020-05:04:32] [I] total compute time: 3.04467 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x224x224 --optShapes=input:16x1x224x224 --maxShapes=input:32x1x224x224

Thanks

@SunilJB Thanks for the reply.

So as a more general question, when would one use trtexec versus using the C++ API? Would using trtexec the way you did end up creating a saved serialized engine that you can later load from file using the C++ API?

Thanks for the clarification on the specific dimensions that can be changed for this example. I tried using your shapes for min, opt, and max, but I still get the same error on line:

mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config));

I get the following on the console:
[E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[E] [TRT] Network validation failed.

In fact, this error occurs before even entering the buildPreprocessorEngine(builder) method, which is where the shapes are specified.

The order of the calling is:
void SampleDynamicReshape::build()
{
auto builder = makeUnique(nvinfer1::createInferBuilder(gLogger.getTRTLogger()));

    // This function will also set mPredictionInputDims and mPredictionOutputDims,
    // so it needs to be called before building the preprocessor.
    buildPredictionEngine(builder);
    buildPreprocessorEngine(builder);
}

The error occurs within the buildPredictionEngine(builder) call, and an optimization profile is not specified until the buildPreprocessorEngine(builder) call.

I am taking this from the Github example: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape

It seems that I must be doing something out of order, but it looks like the example does it the same way.

trtexec is a tool to quickly utilize TensorRT without having to develop your own application

You can add --saveEngine=<filename> argument in trtexec command to save the generated engine.

Please refer to below link for more details about the sample:

Thanks

@SunilJB Is the serialized engine that’s saved supposed to be one of a fixed size or dynamic size? I tried it and it seems that the input and output tensor dimensions are still dynamically-shaped.