TensorRT 9.3 Custom plugins appear to be strangely time-consuming

Description

I implemented a TensorRT plugin and found the plugin to be particularly time-consuming.

I am compiling the plugin as a separate library and then calling it using the C++ api.

void* plugin_handle{ builder->getPluginRegistry().loadLibrary(pluginlib_path_.c_str()) };
// or
void* plugin_handle{ runtime->getPluginRegistry().loadLibrary(pluginlib_path_.c_str()) };

Now I’ve tried compiling the TensorRT source code whth plugin, and the result is the same.

I used cudaStreamSynchronize for synchronization in the begin of enqueue function, and measured it to take about 165ms.

int32_t NmsdetaIPluginV2DynamicExt::enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) IS_NOEXCEPT
{
    time_point<high_resolution_clock> m_begin = high_resolution_clock::now();

    cudaStreamSynchronize(stream);

    printf("--->> plugin: %ld, %d\n", duration_cast<microseconds>(high_resolution_clock::now() - m_begin).count(), __LINE__);
    m_begin = system_clock::now();

   ...
}

How can I solve this issue? please offer me some advice.

Environment

TensorRT Version: 9.3

NVIDIA GPU: GeForce RTX 3090

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

CUDNN Version: 8.9.6

Operating System: ubuntu 22.04

Relevant Files

NmsdetaIPluginV2DynamicExt.cpp.txt (13.7 KB)

NmsdetaIPluginV2DynamicExt.h.txt (4.8 KB)

it is the same to me,
how did you solve this cost?

You can reference this link.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.