Negative shape tensor on shufflelayer cause OOM

TJWindows · May 15, 2020, 3:42am

Description

When I set the shape tensor (-1, h*w) to shuffle layer, it give me OOM on some profile.

[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: Internal error: could not find any implementation for node (Unnamed Layer* 6) [Shuffle], try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: …/builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0

Environment

TensorRT Version: 7.0
GPU Type: 2080ti
Nvidia Driver Version: 440
CUDA Version: 10.0
CUDNN Version:
Operating System + Version: ubuntu18.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

Steps To Reproduce

import tensorrt as trt 
import torch
import numpy as np


def main():

    print("create trt model")
    log_level=trt.Logger.ERROR
    logger = trt.Logger(log_level)
    builder = trt.Builder(logger)
    EXPLICIT_BATCH = 1 << (int)(
        trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(EXPLICIT_BATCH)
    input_name = 'input'
    output_name = 'output'
    input_trt = network.add_input(name=input_name, shape=[1,-1,-1,-1], dtype=trt.float32)

    # create shape
    shape_trt = network.add_shape(input_trt).get_output(0)
    shape_h_trt = network.add_slice(shape_trt, [2],[1],[1]).get_output(0)
    shape_w_trt = network.add_slice(shape_trt, [3],[1],[1]).get_output(0)
    shape_hw_trt = network.add_elementwise(shape_h_trt, shape_w_trt, trt.ElementWiseOperation.PROD).get_output(0)
    shape_minus_trt = network.add_constant([1], np.array([-1],dtype=np.int32)).get_output(0)
    new_shape_trt = network.add_concatenation([shape_minus_trt, shape_hw_trt]).get_output(0)

    # reshape
    layer = network.add_shuffle(input_trt)
    layer.set_input(1, new_shape_trt)
    output = layer.get_output(0)
    output.name = output_name

    network.mark_output(output)
    max_workspace_size = 1<<30
    fp16_mode = False

    builder.max_workspace_size = max_workspace_size
    builder.fp16_mode = fp16_mode

    config = builder.create_builder_config()
    config.max_workspace_size = max_workspace_size
    profile = builder.create_optimization_profile()

    min_shape = (1, 4, 50, 50)
    opt_shape = (1, 16, 400, 400)
    max_shape = (1, 32, 800, 800)
    profile.set_shape(
        input_name, min_shape, opt_shape, max_shape)
    config.add_optimization_profile(profile)
    if fp16_mode:
        config.set_flag(trt.BuilderFlag.FP16)

    engine = builder.build_engine(network, config)
    context = engine.create_execution_context()

    print("inference")
    input_torch = torch.rand(1,16,400,600).cuda()

    bindings = [None] * 2
    idx = engine.get_binding_index(input_name)
    context.set_binding_shape(idx, tuple(input_torch.shape))
    bindings[idx] = input_torch.data_ptr()

    idx = engine.get_binding_index(output_name)
    shape = tuple(context.get_binding_shape(idx))
    output_torch = torch.empty(shape,dtype=torch.float32).cuda()
    bindings[idx] = output_torch.data_ptr()

    context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)

    print(output_torch.shape)
    print(output_torch.view(-1)[:10])



if __name__ == "__main__":
    main()

if the shape of profile is :
min_shape = (1, 4, 500, 500)
opt_shape = (1, 16, 600, 600)
max_shape = (1, 32, 800, 800)

or:
min_shape = (1, 4, 50, 50)
opt_shape = (1, 16, 100, 100)
max_shape = (1, 32, 200, 200)

every thing is fine.

and if the shape tensor is (nc, hw), it is also OK.

Is there some limit about the min_shape and max_shape?

SunilJB · May 15, 2020, 6:14pm

Thank you for sharing the details and script to reproduce the issue.
We will look into it and update you accordingly.

Thanks

SunilJB · July 9, 2020, 5:09am

Hi,

Fix will be available in future release. Please stay tuned.

Thanks