Bugs for CustomQKVToContextPluginDynamic Plugin

lqs1 · February 8, 2023, 8:58am

Description

Hi, I’m trying to use the CustomQKVToContextPluginDynamic Plugin in my TensorRT engine, but failed in some cases.

For plugin_version=1, type_id=0, everything works fine.
For plugin_version=1, type_id=1, trtexec raises Error[9]: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (/CustomQKVToContextPluginDynamic: could not find any supported formats consistent with input/output data types).
For plugin_version>1, everything does not work.

Environment

nvidia docker container 22.12

Relevant Files

Steps To Reproduce

I have the following code:

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, hidden_size, num_heads):
        return input
    @staticmethod
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=0, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        super().__init__()
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda()#.half()
input = torch.randn(512, 2, 768).cuda()#.half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

which can output an onnx file, then use trtexec to transform it into an engine.

lqs1 · February 8, 2023, 9:11am

I know where is wrong… I forgot to add --fp16 flag for trtexec command.

lqs1 · February 10, 2023, 3:48am

Now the magic happens… When I change seq_len from 512 to 128, the engine will not work…

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, hidden_size, num_heads):
        return input
    @staticmethod
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=1, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        super().__init__()
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda().half()
input = torch.randn(128, 2, 768).cuda().half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

Then use trtexec --onnx=test.onnx --saveEngine=test.trt --fp16, it will raise this error:

[02/10/2023-03:43:37] [I] Setting persistentCacheLimit to 0 bytes.
[02/10/2023-03:43:37] [I] Using random values for input input_0
[02/10/2023-03:43:37] [I] Created input binding for input_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Using random values for output output_0
[02/10/2023-03:43:37] [I] Created output binding for output_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Starting inference
[02/10/2023-03:43:37] [F] [TRT] Assertion failed: findIter != mFunctions.end()
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/bertQKVToContextPlugin/fused_multihead_attention/include/fused_multihead_attention.h:398
Aborting...

Aborted (core dumped)

spolisetty · February 14, 2023, 10:34am

Hi,

We were unable to reproduce the issue after changing the above 512 to 128. It’s working fine for us. Please use the latest TensorRT version 8.5.3.

[02/14/2023-10:32:40] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # trtexec --onnx=test.onnx --fp16 --verbose --workspace=20000

Thank you.

lqs1 · February 14, 2023, 11:15am

I see. However, official docker container hasn’t include 8.5.3 yet. Hope for an update!

system · February 28, 2023, 11:16am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use input_mask for CustomQKVToContextPluginDynamic plugin? TensorRT	1	844	February 10, 2023
Assertion failed: *tensor = importer_ctx->network()->addInput( input.name().c_str(), trt_dtype, trt_dims) TensorRT tensorrt	15	924	June 24, 2022
Onnx2TRT missing TensorListStack plugin TensorRT	1	1076	December 9, 2021
nvonnxparser::IParse::parse() fail,and trt report paramenter check fail TensorRT tensorrt	7	1200	July 12, 2021
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1387	November 17, 2022
Trtexec create engine failed from onnx when adding dynamic shapes TensorRT	5	2108	June 22, 2021
Onnx with dynamic batch cannot be parsed TensorRT tensorrt	12	1515	August 9, 2021
Custom plugin supporting int8 I/O type check fail TensorRT	2	538	May 26, 2023
Could not find any supported formats consistent with input/output data types TensorRT	1	828	April 11, 2023
getPluginCreator could not find plugin: EfficientNMS_TRT version: 1 error with C++ API but works fine with Python API TensorRT tensorrt	14	2551	January 11, 2024

Bugs for CustomQKVToContextPluginDynamic Plugin

Description

Environment

Relevant Files

Steps To Reproduce

Related topics