NVIDIA RTX 2080Ti
I got a mAP loss in FP16 mode from a FP32 model when I impletmented a pruned yolov3 onnx model to trt model.
I convert the onnx model to FP16 in the following c++ code.
IBuilder* builder = createInferBuilder(gLogger); nvinfer1::INetworkDefinition* network = builder->createNetwork(); auto parser = nvonnxparser::createParser(*network, gLogger); builder->setMaxBatchSize(maxBatchSize); builder->setMaxWorkspaceSize(1 << 20); <b>builder->setFp16Mode(true);</b> ICudaEngine* engine = builder->buildCudaEngine(*network); parser->destroy(); trtModelStream = engine->serialize(); engine->destroy(); network->destroy(); builder->destroy();
It works pretty well in FP32.However,when I convert it to FP16,some objects can not be detected in the the bottom half of the picture while it works well in the top half of the picture.It seems that the model converge in the top half of the picture but not converge in the bottom half of the picture.And I also confuse that I got FP16 model with different inference results when I convert the same FP32 onnx model different times.Is there any stochastic process in the trt model converting process? Or Is there any other command I can do with FP16 except let ‘builder->setFp16Mode(true)’.
I also try a lot about model with int8 converting.But it seems that there is not so much advantage in FPS but it has much more mAP loss compared with FP16 in RTX20 series.