Bugs for CustomQKVToContextPluginDynamic Plugin


Hi, I’m trying to use the CustomQKVToContextPluginDynamic Plugin in my TensorRT engine, but failed in some cases.

  1. For plugin_version=1, type_id=0, everything works fine.
  2. For plugin_version=1, type_id=1, trtexec raises Error[9]: [pluginV2Builder.cpp::reportPluginError::23] Error Code 9: Internal Error (/CustomQKVToContextPluginDynamic: could not find any supported formats consistent with input/output data types).
  3. For plugin_version>1, everything does not work.


nvidia docker container 22.12

Relevant Files

Steps To Reproduce

I have the following code:

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    def forward(ctx, input, hidden_size, num_heads):
        return input
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=0, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda()#.half()
input = torch.randn(512, 2, 768).cuda()#.half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

which can output an onnx file, then use trtexec to transform it into an engine.

I know where is wrong… I forgot to add --fp16 flag for trtexec command.

Now the magic happens… When I change seq_len from 512 to 128, the engine will not work…

import torch
import torch.nn as nn

# https://github.com/NVIDIA/TensorRT/tree/release/8.5/plugin/bertQKVToContextPlugin
# The yaml file says that version 3 is not supported yet.

class CustomQKVToContextPluginDynamic(torch.autograd.Function):
    def forward(ctx, input, hidden_size, num_heads):
        return input
    def symbolic(g, input, hidden_size, num_heads):
        return g.op("CustomQKVToContextPluginDynamic", input, plugin_version_s='1', type_id_i=1, hidden_size_i=hidden_size, num_heads_i=num_heads, has_mask_i=False)

class MyModule(nn.Module):
    def __init__(self, hidden_size, num_heads):
        assert hidden_size % num_heads == 0
        self.hidden_size = hidden_size
        self.num_heads = num_heads
        self.size_per_head = hidden_size // num_heads
        self.Wq = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wk = nn.Linear(self.hidden_size, self.hidden_size)
        self.Wv = nn.Linear(self.hidden_size, self.hidden_size)
    def forward(self, x):
        # shape of x (seq_len, batch_size, hidden_size)
        # output (seq_len, batch_size, hidden_size)
        Q = self.Wq(x)
        K = self.Wk(x)
        V = self.Wv(x)
        qkv = torch.cat([Q, K, V], dim=2)
        qkv = qkv.view(x.size(0), x.size(1), 3, self.num_heads, self.size_per_head)
        qkv = qkv.transpose(2, 3).contiguous().view(x.size(0), x.size(1), 3*self.hidden_size, 1, 1)
        return CustomQKVToContextPluginDynamic.apply(qkv, self.hidden_size, self.num_heads).select(-1, 0).select(-1, 0)

model = MyModule(768, 8).cuda().half()
input = torch.randn(128, 2, 768).cuda().half()

from torch.onnx import OperatorExportTypes
torch.onnx.export(model, (input,), 'test.onnx', operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH, input_names=['input_0'], output_names=['output_0'])

Then use trtexec --onnx=test.onnx --saveEngine=test.trt --fp16, it will raise this error:

[02/10/2023-03:43:37] [I] Setting persistentCacheLimit to 0 bytes.
[02/10/2023-03:43:37] [I] Using random values for input input_0
[02/10/2023-03:43:37] [I] Created input binding for input_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Using random values for output output_0
[02/10/2023-03:43:37] [I] Created output binding for output_0 with dimensions 128x2x768
[02/10/2023-03:43:37] [I] Starting inference
[02/10/2023-03:43:37] [F] [TRT] Assertion failed: findIter != mFunctions.end()

Aborted (core dumped)


We were unable to reproduce the issue after changing the above 512 to 128. It’s working fine for us. Please use the latest TensorRT version 8.5.3.

[02/14/2023-10:32:40] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # trtexec --onnx=test.onnx --fp16 --verbose --workspace=20000

Thank you.

I see. However, official docker container hasn’t include 8.5.3 yet. Hope for an update!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.