Bugs for CustomQKVToContextPluginDynamic Plugin

Description

Hi, I’m trying to use the CustomQKVToContextPluginDynamic Plugin in my TensorRT engine, but failed in some cases.

  1. For plugin_version=1, type_id=0, everything works fine.
  2. For plugin_version=1, type_id=1, trtexec raises Error[9]: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (/CustomQKVToContextPluginDynamic: could not find any supported formats consistent with input/output data types).
  3. For plugin_version>1, everything does not work.

Environment

nvidia docker container 22.12

Relevant Files

Steps To Reproduce

I have the following code:

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, hidden_size, num_heads):
        return input
    @staticmethod
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=0, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        super().__init__()
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda()#.half()
input = torch.randn(512, 2, 768).cuda()#.half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

which can output an onnx file, then use trtexec to transform it into an engine.

I know where is wrong… I forgot to add --fp16 flag for trtexec command.

Now the magic happens… When I change seq_len from 512 to 128, the engine will not work…

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, hidden_size, num_heads):
        return input
    @staticmethod
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=1, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        super().__init__()
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda().half()
input = torch.randn(128, 2, 768).cuda().half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

Then use trtexec --onnx=test.onnx --saveEngine=test.trt --fp16, it will raise this error:

[02/10/2023-03:43:37] [I] Setting persistentCacheLimit to 0 bytes.
[02/10/2023-03:43:37] [I] Using random values for input input_0
[02/10/2023-03:43:37] [I] Created input binding for input_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Using random values for output output_0
[02/10/2023-03:43:37] [I] Created output binding for output_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Starting inference
[02/10/2023-03:43:37] [F] [TRT] Assertion failed: findIter != mFunctions.end()
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/bertQKVToContextPlugin/fused_multihead_attention/include/fused_multihead_attention.h:398
Aborting...

Aborted (core dumped)

Hi,

We were unable to reproduce the issue after changing the above 512 to 128. It’s working fine for us. Please use the latest TensorRT version 8.5.3.

[02/14/2023-10:32:40] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # trtexec --onnx=test.onnx --fp16 --verbose --workspace=20000

Thank you.

I see. However, official docker container hasn’t include 8.5.3 yet. Hope for an update!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.