ExecuteV2 freezes after running a few inferences


The application i’m working on is supposed to run inferences in a sequential manner but after a while (under a minute) the execution freezes and the last instruction executed was calling executeV2.


TensorRT Version: 8
GPU Type: v100
Nvidia Driver Version: 460.91
CUDA Version: 11.2
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Just running inferences for a couple of seconds will freeze the execution completely.

Skeleton TrtEngine::runInference(const std::string &base64, Skeleton old)
    const std::regex regex("^data:image\/[a-z]+;base64,"); // matches with base64-encoded binary image data

    if (std::regex_search(base64, regex, std::regex_constants::match_continuous))
        const auto rawImageData = base64_decode(std::regex_replace(base64, regex, ""));
        auto imageData = cv::imdecode(cv::Mat(1, rawImageData.length(), CV_8UC1, const_cast<char *>(rawImageData.c_str())),1);

        // get sizes of input and output and allocate memory required for input data and for output data
        std::vector<nvinfer1::Dims> input_dims; // we expect only one input
        std::vector<nvinfer1::Dims> output_dims; // and one output
        void* buffers[engine->getNbBindings()]; // buffers for input and output data
        for (int i = 0; i < engine->getNbBindings(); ++i)
            auto binding_size = getSizeByDim(engine->getBindingDimensions(i)) * sizeof(float);
            cudaMalloc(&buffers[i], binding_size);
            if (engine->bindingIsInput(i))
        preprocessImage(imageData, (float*)buffers[0], input_dims[0]);

        // Here the execution stops

        auto sk = postprocessResults(buffers[2], buffers[1], output_dims[1], output_dims[0], old);
        for (void* buf : buffers)
        return sk;
    return Skeleton();

void TrtEngine::preprocessImage(cv::Mat &image, float *gpu_input, const nvinfer1::Dims &dims)
    cv::cuda::GpuMat gpu_frame;
    // upload image to GPU

    auto input_width = dims.d[3];
    auto input_height = dims.d[2];
    auto channels = dims.d[1];
    auto input_size = cv::Size(input_width, input_height);
    // normalize
    cv::cuda::GpuMat flt_image;
    gpu_frame.convertTo(flt_image, CV_32FC3, 1.f / 255.f);
    cv::cuda::subtract(flt_image, cv::Scalar(0.485f, 0.456f, 0.406f), flt_image, cv::noArray(), -1);
    cv::cuda::divide(flt_image, cv::Scalar(0.229f, 0.224f, 0.225f), flt_image, 1, -1);
    // to tensor
    std::vector<cv::cuda::GpuMat> chw;
    for (int i = 0; i < channels; ++i)
        chw.emplace_back(cv::cuda::GpuMat(input_size, CV_32FC1, gpu_input + i * input_width * input_height));
    cv::cuda::split(flt_image, chw);

Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link: