TensorRT 8.x - recompile bertQKVToContextPlugin for GPT-J


I’m trying to use bertQKVToContextPlugin to accelerate inference if GPT-j Transformer (EleutherAI/gpt-j-6B · Hugging Face), but the plugin could be used only in case header_size < 64, for. GPT-j 256 is required


TensorRT Version: 8.x

I was able to run this plugin, but in the result I got an assert “header_size <= 64”. The major problem is kernels for this plugin are generated only for header sizes equal to 32 or 64. But I need a header size equal to 256, how could I generate it?


Currently bertQKVToContextPlugin doesn’t support such a large header_size. could you please try constructing the MHA part using primitive ops (like MMs and elementwise ops) instead of using the plugin?

