[gemmBaseRunner.cpp::nvinfer1::rt::task::CaskGemmBaseRunner::executeGemm::455] Error Code 1: Cask (Cask Gemm execution)


According to the debug method been mentioned in this issue:

I want to dump intermidiate Tensor for debug, so I marked matmul output as network output in ONNX-TensorRT src like the github issue said:

code added into this position: https://github.com/onnx/onnx-tensorrt/blob/0462dc31ae78f48744b6141ae376df1f96d3f459/ModelImporter.cpp#L628

debug codes

    for (int i = 0; i < graph.node_size(); i ++){
        ::ONNX_NAMESPACE::NodeProto const& node = graph.node(i);

        if( node.output().size() > 0 && node.op_type() == "MatMul") {
            nvinfer1::ITensor* new_output_tensor_ptr = &_importer_ctx.tensors().at(node.output(0)).tensor();


TensorRT Version:
GPU Type: RTX 4070 Laptop
Nvidia Driver Version: 536.25
CUDA Version: 11.8
CUDNN Version: 8.9.1
Operating System + Version: Windows 11
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

When I inference the TensorRT network, here comes the error log:

[TRT] Error: 1: [gemmBaseRunner.cpp::nvinfer1::rt::task::CaskGemmBaseRunner::executeGemm::455] Error Code 1: Cask (Cask Gemm execution)

Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.