How to transfer int8 to float32, and do not use int8 calibrate

Description

I have create a network , I found the cpu<->gpu data copy takes a lot of time ,I have tried the unified memory and the pinned memory, but they don’t work, so I want to lower the size of memory transfer between cpu and gpu: Because the input image format is uin8, which I can transfer to int8 outside, and send int8 input into network, then sub mean and convert the data type to float32 in the tensorrt , the main part of inference is using float32 as usual, and output is a argmax layer, which can be also convert to int8 data type.So how I can do this ? I have saw the sample of example adding int8 I/O, but I found the hardest part is transfer datatype from int8 to float32.Is there any help?

1 Like

Hi @niujiaopeng
You could setup input dynamic range with -128,127 and explicitly specify all layers running with FP32 precision with strict type flag. And use reformat free IO feature. But you may still need to setup all dynamic range for all tensors(Maybe all set to -128,127).

Thanks!

Thank you very much. In the sampleReformatFreeIO.cpp, in the function of readDigits, about line 439 to 444:
float* inputBuf = reinterpret_cast<float*>(buffer.buffer);

for (int i = 0; i < inputH * inputW; i++)
{
    inputBuf[i] = float(fileData[i]);
}

the code is readPGM to a uint8 buffer first, and still using float type input ,but I want to feed into tensorrt int8 data first ,then convert input to float32 and doing inference.Could you help me please?

Hi @niujiaopeng,
We recommend you to simply use reformat free IO, and set dynamic ranges appropriately (in int8 mode).
Even if you just want to feed int8 data, and want next layer to consume in float, you can use setPrecision API to set next layer precision to float.
TRT would introduce a copy (int8 to float) internally. So there is no escape from copy operation.
If performance is of concern, int8 IO + dynamic range / calibration is recommended.

1 Like