Negative shape tensor on shufflelayer cause OOM

Description

When I set the shape tensor (-1, h*w) to shuffle layer, it give me OOM on some profile.

[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: Internal error: could not find any implementation for node (Unnamed Layer* 6) [Shuffle], try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: …/builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0

Environment

TensorRT Version: 7.0
GPU Type: 2080ti
Nvidia Driver Version: 440
CUDA Version: 10.0
CUDNN Version:
Operating System + Version: ubuntu18.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

Steps To Reproduce

import tensorrt as trt 
import torch
import numpy as np


def main():

    print("create trt model")
    log_level=trt.Logger.ERROR
    logger = trt.Logger(log_level)
    builder = trt.Builder(logger)
    EXPLICIT_BATCH = 1 << (int)(
        trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(EXPLICIT_BATCH)
    input_name = 'input'
    output_name = 'output'
    input_trt = network.add_input(name=input_name, shape=[1,-1,-1,-1], dtype=trt.float32)

    # create shape
    shape_trt = network.add_shape(input_trt).get_output(0)
    shape_h_trt = network.add_slice(shape_trt, [2],[1],[1]).get_output(0)
    shape_w_trt = network.add_slice(shape_trt, [3],[1],[1]).get_output(0)
    shape_hw_trt = network.add_elementwise(shape_h_trt, shape_w_trt, trt.ElementWiseOperation.PROD).get_output(0)
    shape_minus_trt = network.add_constant([1], np.array([-1],dtype=np.int32)).get_output(0)
    new_shape_trt = network.add_concatenation([shape_minus_trt, shape_hw_trt]).get_output(0)

    # reshape
    layer = network.add_shuffle(input_trt)
    layer.set_input(1, new_shape_trt)
    output = layer.get_output(0)
    output.name = output_name

    network.mark_output(output)
    max_workspace_size = 1<<30
    fp16_mode = False

    builder.max_workspace_size = max_workspace_size
    builder.fp16_mode = fp16_mode

    config = builder.create_builder_config()
    config.max_workspace_size = max_workspace_size
    profile = builder.create_optimization_profile()

    min_shape = (1, 4, 50, 50)
    opt_shape = (1, 16, 400, 400)
    max_shape = (1, 32, 800, 800)
    profile.set_shape(
        input_name, min_shape, opt_shape, max_shape)
    config.add_optimization_profile(profile)
    if fp16_mode:
        config.set_flag(trt.BuilderFlag.FP16)

    engine = builder.build_engine(network, config)
    context = engine.create_execution_context()

    print("inference")
    input_torch = torch.rand(1,16,400,600).cuda()

    bindings = [None] * 2
    idx = engine.get_binding_index(input_name)
    context.set_binding_shape(idx, tuple(input_torch.shape))
    bindings[idx] = input_torch.data_ptr()

    idx = engine.get_binding_index(output_name)
    shape = tuple(context.get_binding_shape(idx))
    output_torch = torch.empty(shape,dtype=torch.float32).cuda()
    bindings[idx] = output_torch.data_ptr()

    context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)

    print(output_torch.shape)
    print(output_torch.view(-1)[:10])



if __name__ == "__main__":
    main()

if the shape of profile is :
min_shape = (1, 4, 500, 500)
opt_shape = (1, 16, 600, 600)
max_shape = (1, 32, 800, 800)

or:
min_shape = (1, 4, 50, 50)
opt_shape = (1, 16, 100, 100)
max_shape = (1, 32, 200, 200)

every thing is fine.

and if the shape tensor is (nc, hw), it is also OK.

Is there some limit about the min_shape and max_shape?

Thank you for sharing the details and script to reproduce the issue.
We will look into it and update you accordingly.

Thanks

Hi,

Fix will be available in future release. Please stay tuned.

Thanks