using Jetson xavier and Jetpack 4.1.1,TensorRT 5.0,caffe model parser
Hello,
I’m running inference on a image face detection network on batch size 1,some custom layers have created to implemented the priorbox layer.From the result, the inference time with the FLOAT format is running a little faster than the HALF.
Is it true? And what reasons will caused to this result? If not, what details I ignored to implement the Half mode?
The running result is:
engine is FP16 mode
image inference consume time: 15.5089ms
image preprocess consume time: 67.05ms
engine is FP32 mode
image inference consume time: 14.7815ms
image preprocess consume time: 63.854ms
My code to implement FP16:
bool useFp16 = builder->platformHasFastFp16();
DataType modelDataType = useFp16?DataType::kHALF:DataType::kFLOAT;
const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile,directories).c_str(),
locateFile(modelFile, directories).c_str(),
*network, modelDataType);
// specify which tensors are outputs
for (auto& s : outputs)
{
network->markOutput(*blobNameToTensor->find(s.c_str()));
}
//set workspace
builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(36 << 20);
//enable fp16 mode
builder->setFp16Mode(useFp16);
builder->setStrictTypeConstraints(useFp16);
ICudaEngine* engine = builder->buildCudaEngine(*network);
assert(engine);
……
By the way, I have maximized the CPU/GPU clocks first, and the plugin layer has changed the output format from float to half.