How can I use int8 typed inputs into TensorRT6.0?


I have managed to export my tensorflow model with onnx format and loaded into TensorRT 6.0 successfully, and the model ran with TensorRT smoothly as well, also I used FP16 mode.

The problem came when doing profiling I found that copying feature tensors from host memory to device memory takes even longer time then inference itself. I realized that the problem is I have too big tensors to be copied to GPU from CPU, and they’re float32 typed.

My idea was to make those tensors as int8 type and use Cast op to convert them into float inside TensorRT, by doing this the amount of data that needs to be copied is only 1/4, however this doesn’t work.

  1. I found in TensorRT6.0 with the ONNX parser, the Cast Op seems only support casting from FP16 to FP32, but doesn’t support casting from int8 to FP32. The failure message was found in the code here:
    // TensorRT only supports the following conversion: FP16 → FP32.
    ASSERT(trt_dtype == nvinfer1::DataType::kHALF && cast_dtype == ::ONNX_NAMESPACE::TensorProto::FLOAT,

  2. Since the Cast Op doesn’t work for me, I tried another idea. I added one IIdentityLayer between my input tensors and their connecting layers. So my IIdentityLayer could copy the input int8 tensors and convert them to FP32 or FP 16 inside this IIdentityLayer and then pass the data downstream, but, this way doesn’t work neither. The error is even odder:
    Using kFLOAT for region surrounded by copy operations: concat:0
    traj_rank_evaluator: …/builder/cudnnBuilder2.cpp:1651: std::vector nvinfer1::builder::makePaddedScale(const nvinfer1::builder::Region&, const RegionDynamicRanges*, float): Assertion `regionRanges != nullptr’ failed.

So could anyone offer me a help on this issue? How can I use int8 as input and then cast it as float inside TensorRT? Is it possible to convert int8 to float inside TensorRT6.0?

Lu Chen

Seems no one replied me so far so I am posting my latest testing here:

I thought it might be a unsupported feature in tensorrt 6.0, so I tried with tensorrt7.0, unfortunately 7.0 doesn’t work neither.

It is quite strange to me as in TensorRT website the Cast Op is marked as supported, however it doesn’t really work in reality, or at least working with some limitations( such as does support casting from fp16 to fp32). When you say something is working, could you please make sure it is fully working, or at least let us know in which part it is not working?

So far my guess is inside TensorRT kernal there is something not working, could any NV engineers give me any definite answer? Thank you.


Please refer to below links:


Thanks for your replay and I did go through this section before, however I didn’t really try this way. Eventually I resolved this issue by writing a CUDA kernel which does the cast job. But I will have a try with your recommendation.


Could you share the CUDA kernel that you wrote? I’m facing the exact same issue and that would be really helpful!


1 Like