Required for TensorRT Optimization Errors

I am currently facing issues with TensorRT when using higher optimization levels. Specifically, I encounter errors when executing the following command:

trtexec --onnx=a.onnx --saveEngine=save.trt --fp16 --builderOptimizationLevel=5

I have observed that both --builderOptimizationLevel=5 and --builderOptimizationLevel=4 result in errors, whereas the default --builderOptimizationLevel=3 works fine. Additionally, removing EfficientNMS_TRT resolves the issue, allowing the command to execute successfully.

Below are the errors I am encountering:

[03/19/2025-10:12:51] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +420, now: CPU 0, GPU 444 (MiB)
[03/19/2025-10:12:51] [E] Error[1]: [resizingAllocator.cpp::allocate::74] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[03/19/2025-10:12:51] [W] [TRT] Requested amount of GPU memory (25600 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[03/19/2025-10:12:51] [E] Error[2]: [executionContext.cpp::handleTrainStationRunnerPhase2::256] Error Code 2: OutOfMemory (Requested size was 25600 bytes.)

Could you please provide any suggestions or solutions to address these issues?