According to the debug method been mentioned in this issue:

I want to dump intermidiate Tensor for debug, so I marked matmul output as network output in ONNX-TensorRT src like the github issue said:

code added into this position: https://github.com/onnx/onnx-tensorrt/blob/0462dc31ae78f48744b6141ae376df1f96d3f459/ModelImporter.cpp#L628

debug codes

    for (int i = 0; i < graph.node_size(); i ++){
        ::ONNX_NAMESPACE::NodeProto const& node = graph.node(i);

        if( node.output().size() > 0 && node.op_type() == "MatMul") {
            nvinfer1::ITensor* new_output_tensor_ptr = &_importer_ctx.tensors().at(node.output(0)).tensor();


TensorRT Version:
GPU Type: RTX 4070 Laptop
Nvidia Driver Version: 536.25
CUDA Version: 11.8
CUDNN Version: 8.9.1
Operating System + Version: Windows 11
When I inference the TensorRT network, here comes the error log:

[TRT] Error: 1: [gemmBaseRunner.cpp::nvinfer1::rt::task::CaskGemmBaseRunner::executeGemm::455] Error Code 1: Cask (Cask Gemm execution)

