TensorRT 8.x - recompile bertQKVToContextPlugin for GPT-J


I’m trying to use bertQKVToContextPlugin to accelerate inference if GPT-j Transformer (EleutherAI/gpt-j-6B · Hugging Face), but the plugin could be used only in case header_size < 64, for. GPT-j 256 is required


TensorRT Version: 8.x

Relevant Files

Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.


I was able to run this plugin, but in the result I got an assert “header_size <= 64”. The major problem is kernels for this plugin are generated only for header sizes equal to 32 or 64. But I need a header size equal to 256, how could I generate it?


Currently bertQKVToContextPlugin doesn’t support such a large header_size. could you please try constructing the MHA part using primitive ops (like MMs and elementwise ops) instead of using the plugin?

Thank you.