int8 Calibration Problem with Sigmoid Activation Functions (Name confusion)

langerz2hh2 · September 18, 2019, 6:34am

Hello,

i’m facing a strange problem, which i can’t get solved.

Workflow:

-My colleague trained a model and has done an int8 calibration in python. The model has a Sigmoid activation function at the output (roughly said, it’s a u-net).

-He hands me the exported model as an *.uff file and a separate file with the calibration table.

-I successfully parsed the network with tensorrt in an c++ environment. Model works fine (float32 mode)

-When I activate int8 mode, I get an error message when calibration is done:

misc error nvinfer1::builder could not find scales for tensor conv22/Sigmoid_HL_41

I checked the calibration table file and noticed, this tensorhas a different name (conv22/Sigmoid_HL_)

-I renamed the tensor in the calibration file to (conv22/Sigmoid_HL_41)

-Parsing, calibration and inference works like charm in my cpp/tensorrt environment

So thats the first part of the Problem. Why differs the tensor’s name between my parsed model and the calibration table my colleague generated?

-I tested some more things and noticed the following:
When I tried to parse the model a second time while the programm is not closed between (with the manual modified calibration table) I run into the same error like above. Tensorrt now expects another tensor name than “conv22/Sigmoid_HL_41” (now: conv22/Sigmoid_HL_18467)

-I noticed tensor name when parsing second time within runtime is always “conv22/Sigmoid_HL_18467”. Regardless the model, *.uff file and calibration table.

Why the tensor name varies within runtime of program? How can I prevent this and get a constant name for the tensor?

Thanks for your replys, which hopefully can help me :)

Edit: I’m using TensorRT 5.1.5.0

zackwang · November 6, 2019, 8:04pm

I met an exactly same problem. Tested with tensorrt 5.0.2 and tensorrt 5.1.5, both has the same error. Hope there is a solution.

Also I found that it seems to only happen to the output layer.

langerz2hh2 · May 14, 2020, 11:55pm

Hello,

@zackwang: Could you solve the problem?
I’ve tested the newest Version of TensorRT (7.0.0.11), but this didn’t solved the problem.

I can confirm, that it only the output layer is affected.

zackwang · May 15, 2020, 12:41am

I solved it by fix the random seed with 0 just before the runtime model building, this seems to generate the same random name as the one generated during calibration stage.

langerz2hh2 · May 15, 2020, 6:59am

Thank you for your reply. Sounds interessting.
Can you give me more details about your solution (or even a code snippet). I’m not that familiar with TensorRT.

-Did you set the random seed after IBuilder* builder = createInferBuilder(gLogger) and before INetworkDefinition* network = builder->createNetwork() ?

or

-Did you set the random seed after parsing the buffer to network parser->parseBuffer(uffBuffer, uffBufferSize, *network, dataType) and before set the calibrator builder->setInt8Calibrator(calibrator)

or

-Did you set the random seed right before builder->buildCudaEngine(*network)

And how did you set the random seed? Is there a function for?

EDIT:
I notices you mean the the srand(seed) function from std library.
I tried setting srand(0) or srand(1) before calling builder->buildCudaEngine(*network) but it didn’t work. The expected Layer name at second call still ends with _18467

So can you explain me where exactly you fix the seed?

Thanks in advance!