Description
I implemented a TensorRT plugin and found the plugin to be particularly time-consuming.
I am compiling the plugin as a separate library and then calling it using the C++ api.
void* plugin_handle{ builder->getPluginRegistry().loadLibrary(pluginlib_path_.c_str()) };
// or
void* plugin_handle{ runtime->getPluginRegistry().loadLibrary(pluginlib_path_.c_str()) };
Now I’ve tried compiling the TensorRT source code whth plugin, and the result is the same.
I used cudaStreamSynchronize for synchronization in the begin of enqueue function, and measured it to take about 165ms.
int32_t NmsdetaIPluginV2DynamicExt::enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) IS_NOEXCEPT
{
time_point<high_resolution_clock> m_begin = high_resolution_clock::now();
cudaStreamSynchronize(stream);
printf("--->> plugin: %ld, %d\n", duration_cast<microseconds>(high_resolution_clock::now() - m_begin).count(), __LINE__);
m_begin = system_clock::now();
...
}
How can I solve this issue? please offer me some advice.
Environment
TensorRT Version: 9.3
NVIDIA GPU: GeForce RTX 3090
NVIDIA Driver Version: 535.183.01
CUDA Version: 12.2
CUDNN Version: 8.9.6
Operating System: ubuntu 22.04
Relevant Files
NmsdetaIPluginV2DynamicExt.cpp.txt (13.7 KB)
NmsdetaIPluginV2DynamicExt.h.txt (4.8 KB)