Workspace Size Error by Multiple Conv+Relu Merging on DRIVE AGX

Description

Hi, here is the simplified version of my model. When I build it from onnx, it will raise a error about workspace size not enough. Although the error has been sloved in TRT7, it is hard to using TRT7 on Pegasus. Is it a internal bug in TRT6?

My model with pytorch:

import torch
import torch.nn.functional as F

class Test(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.conv0 = torch.nn.Conv3d(128, 64, 3, 2, 1)
        self.conv1 = torch.nn.Conv3d(128, 64, 3, 2, 1)
        self.conv2 = torch.nn.Conv3d(128, 64, 3, 2, 1)

    def forward(self, x):
        y0 = self.conv0(x)
        y1 = self.conv1(x)
        y2 = self.conv2(x)

        y0 = F.relu(y0)
        y1 = F.relu(y1)
        y2 = F.relu(y2)

        # here also can be 'squeeze'
        y0 = y0.view(y0.shape[:2])
        y1 = y1.view(y1.shape[:2])
        y2 = y2.view(y2.shape[:2])

        return y0, y1, y2

if __name__ == '__main__':
    t = Test()

    x = torch.randn((50, 128, 2, 2, 2))

    y0, y1, y2 = t(x)

    torch.onnx.export(
        t,
        (x),
        'test.onnx',
        verbose=True,
        opset_version=11
    )

The VERBOSE:

INFO: --------------- Layers running on GPU: 
INFO: Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5, Squeeze_6 + Squeeze_9 + Squeeze_12, Squeeze_7 + Squeeze_10 + Squeeze_13, Squeeze_8 + Squeeze_11 + Squeeze_14, 
VERBOSE: Constructing optimization profile number 0 out of 1
VERBOSE: *************** Autotuning format combination: Float(1,2,4,8,1024) -> Float(1,1,1,192,12288) ***************
VERBOSE: --------------- Timing Runner: Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5 (FusedConvActConvolution)
VERBOSE: FusedConvActConvolution has no valid tactics for this config, skipping
VERBOSE: --------------- Timing Runner: Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5 (CaskConvolution)
VERBOSE: CaskConvolution has no valid tactics for this config, skipping
VERBOSE: --------------- Timing Runner: Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5 (CudaConvolution)
VERBOSE: CudaConvolution has no valid tactics for this config, skipping
VERBOSE: --------------- Timing Runner: Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5 (CudaDepthwiseConvolution)
VERBOSE: CudaDepthwiseConvolution has no valid tactics for this config, skipping
ERROR: Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
ERROR: ../builder/tacticOptimizer.cpp (1786) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node Conv_0 + Relu_3 || Conv_1 + Relu_4 || Conv_2 + Relu_5.)
VERBOSE: Builder timing cache: created 0 entries, 0 hit(s)

Environment

TensorRT Version: 6.3.1
GPU Type: First device on Pegasus
CUDA Version: 10.2
CUDNN Version: 7.6.6
Operating System + Version: Ubuntu 18.04.2 LTS
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Here is the onnx.

test_opt.onnx (2.5 MB)

After I mark the output of Relu with Network Output, These Conv+Relu are not merged.

Hi,

On newer TensorRT version we are unable to reproduce this issue.

Thanks for your reply.
Because the the newest toolkit in Nvidia Drive AGX only support to TRT6, we can only do something to avoid the error.

Hi,

How did you manage to install pytorch on Pegasus AGX?
Regardless of the pytorch wheel file I install from this link, I keep getting this error when trying to import torch after installing the wheel file.

ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory