I implemented custom deconvolution plugin layer and the error disappeared.
(It is cudnn based so I think the logic is same as the built-in)
But I wonder why inference and int8 converting memory access different.
First I implemented my last layer as custom plugin layer using host memory for some outputs.
But when I convert the model to INT8, that must use device memory so I have two streams, one device memory code for coverting, one host memory code for inference.