I have managed to export my tensorflow model with onnx format and loaded into TensorRT 6.0 successfully, and the model ran with TensorRT smoothly as well, also I used FP16 mode.
The problem came when doing profiling I found that copying feature tensors from host memory to device memory takes even longer time then inference itself. I realized that the problem is I have too big tensors to be copied to GPU from CPU, and they’re float32 typed.
My idea was to make those tensors as int8 type and use Cast op to convert them into float inside TensorRT, by doing this the amount of data that needs to be copied is only 1/4, however this doesn’t work.
I found in TensorRT6.0 with the ONNX parser, the Cast Op seems only support casting from FP16 to FP32, but doesn’t support casting from int8 to FP32. The failure message was found in the code here:
// TensorRT only supports the following conversion: FP16 -> FP32.
ASSERT(trt_dtype == nvinfer1::DataType::kHALF && cast_dtype == ::ONNX_NAMESPACE::TensorProto::FLOAT,
Since the Cast Op doesn’t work for me, I tried another idea. I added one IIdentityLayer between my input tensors and their connecting layers. So my IIdentityLayer could copy the input int8 tensors and convert them to FP32 or FP 16 inside this IIdentityLayer and then pass the data downstream, but, this way doesn’t work neither. The error is even odder:
Using kFLOAT for region surrounded by copy operations: concat:0
traj_rank_evaluator: …/builder/cudnnBuilder2.cpp:1651: std::vector nvinfer1::builder::makePaddedScale(const nvinfer1::builder::Region&, const RegionDynamicRanges*, float): Assertion `regionRanges != nullptr’ failed.
So could anyone offer me a help on this issue? How can I use int8 as input and then cast it as float inside TensorRT? Is it possible to convert int8 to float inside TensorRT6.0?