Can't load trt engine and throwing an instance of 'nvinfer1::MyelinError'

I accept your advice and converter the .etlt file in the tensorRT20.10 container by
./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etlt -t int8 -e ./lpr_us_onnx_int8.trt

However, I found the model size is wrong when deploying the .trt. The error information is:
Traceback (most recent call last):
File “trt_old.py”, line 243, in
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)
File “trt_old.py”, line 66, in allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

I found the engine.binding.shape is (-1, 3, 48, 96). The batch size can’t be -1 and I don’t know why.
Others also come across the same question as the post says: