Memory deallocation in engine

I’m working on trtexec sample in TensorRT-4.0.1.6.
Actually, I successfully make engine from caffe model and do inference with that engine as following command.

trtexec --deploy=test_network.prototxt --output=prob --model=trained_model.caffemodel --engine=engine trtexec --output=prob --engine=engine

But I noticed that during inference the CPU memory is not released according to the system monitor.
It still retain <700MB memory at the end of main() function.
I already studied that destroy() methods are called in some part of the code.

Why the memory is not released? is it specification of TensorRT?
Or is there any method to release immediately?

[library version]
TensorRT 4.0.1.6
Cuda 8.0
cudnn 7.1.3