Hi, NV experts:
I have a custom op which is not supported by tensorrt
so, I add it as a plugin into tensorrt
I found the whole cost time is improve about 10ms
my test as following:
- I remove this custom op from my onnx file, and export it as .plan file through trtexec, and the cost of whole network is about 50ms;
- I add this custom op(just cudaMemcpy a little data) into my onnx file, and export it as .plan file through trtexec, and the cost of whole network is about 60ms;
- I let my code return directly in the enque function, I found the cost of whole network is still about 60ms, the code like this:
int MyPluginDynamic::enqueue(const nvinfer1::PluginTensorDesc* inputDesc,
const nvinfer1::PluginTensorDesc* outputDesc,
const void* const* inputs, void* const* outputs,
void* workspace, cudaStream_t stream) TRT_NOEXCEPT {
return 0; //return directly
}
I don’t know why trt’s performance is poor after I adding a little custom op, I guess:
- there are some secrect about trt which I don’t know.
- my op import extra overhead which I don’t know;
So, Is there anyone would like to teach me this secret?