xavier ternsorrt mnist fp16 is slower the fp32?

xd_malei · July 22, 2019, 10:28am

i change ternsorrt sampleUffMNIST for test fp32 and fp16 , code like this

if(gUseFp16){
std::cout << “use fp16” << std::endl;
if (!parser->parse(uffFile, *network, nvinfer1::DataType::kHALF))
RETURN_AND_LOG(nullptr, ERROR, “Fail to parse”);
builder->setFp16Mode(true);
} else {
std::cout << “use fp32” << std::endl;
if (!parser->parse(uffFile, *network, nvinfer1::DataType::kFLOAT))
RETURN_AND_LOG(nullptr, ERROR, “Fail to parse”);
}

i get result output:

./sample_uff_mnist --fp16

../data/mnist/lenet5.uff
use fp16
run[0] use 0.62032 ms.
run[1] use 0.5256 ms.
run[2] use 0.632768 ms.
run[3] use 0.579488 ms.
run[4] use 0.59328 ms.
run[5] use 0.541376 ms.
run[6] use 0.582144 ms.
run[7] use 0.58704 ms.
run[8] use 0.58224 ms.
run[9] use 0.56464 ms.
Average over 10 runs is 0.58089 ms.

./sample_uff_mnist

../data/mnist/lenet5.uff
use fp32
run[0] use 0.5016 ms.
run[1] use 0.55312 ms.
run[2] use 0.463584 ms.
run[3] use 0.35792 ms.
run[4] use 0.399872 ms.
run[5] use 0.5304 ms.
run[6] use 0.383392 ms.
run[7] use 0.531232 ms.
run[8] use 0.45376 ms.
run[9] use 0.44672 ms.
Average over 10 runs is 0.46216 ms.

AastaLLL · July 23, 2019, 6:55am

Hi,

Have you maximized the device performance before profiling?

sudo nvpmodel -m 0
sudo jetson_clocks

Thanks.

xd_malei · July 24, 2019, 3:14am

yes, i had maximize the device performance

AastaLLL · July 24, 2019, 7:00am

Hi,

Which JetPack version do you use?
If you haven’t try v4.2.1, would you mind to give it a try first?

We also try to reproduce this issue internally.
Will update more information with you once we find anything.

Thanks.,

xd_malei · July 24, 2019, 8:43am

I use jetpack-4.2,

head -n 1 /etc/nv_tegra_release

R31 (release), REVISION: 1.0, GCID: 13194883, BOARD: t186ref, EABI: aarch64, DATE: Wed Oct 31 22:26:16 UTC 2018

I get deb package from https://developer.nvidia.com/assets/embedded/secure/tools/files/jetpack-sdks/jetpack-4.2/JETPACK_42_b158/P2888/, and install them

libcudnn7_7.3.1.28-1+cuda10.0_arm64.deb
libcudnn7-dev_7.3.1.28-1+cuda10.0_arm64.deb
libcudnn7-doc_7.3.1.28-1+cuda10.0_arm64.deb
libnvinfer5_5.0.6-1+cuda10.0_arm64.deb
libnvinfer-dev_5.0.6-1+cuda10.0_arm64.deb
libnvinfer-samples_5.0.6-1+cuda10.0_all.deb
tensorrt_5.0.6.3-1+cuda10.0_arm64.deb

AastaLLL · July 25, 2019, 2:20am

Hi,

We already pass this issue to our internal team.
Will update information with you once we got any feedback.

By the way, we reproduce this issue with JetPack4.2.1 and get a much better performance than yours.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

fp32: Average over 10 runs is 0.288379 ms
fp16: Average over 10 runs is 0.371939 ms

It’s worthy to give JetPack4.2.1 a try.
Thanks.

xd_malei · July 25, 2019, 3:22am

thanks

i reflash the system use JetPack4.2.1 and run the cmd jetson_clocks and nvpmodel -m 0

head -n 1 /etc/nv_tegra_release

R32 (release), REVISION: 2.0, GCID: 15966166, BOARD: t186ref, EABI: aarch64, DATE: Wed Jul 17 00:26:04 UTC 2019

nvpmodel -q –-verbose

NV Fan Mode:quiet
NV Power Mode: MAXN
0

i comment some print and add each context->execute time’ print, test result like these,

root@nvidia-desktop:/opt/project/nvidia/xavier/tensorrt/bin# ./sample_uff_mnist
&&&& RUNNING TensorRT.sample_uff_mnist # ./sample_uff_mnist
[I] ../data/mnist/lenet5.uff
[I] runInFp16=[0] runInInt8=[0]
[I] run[0] use 0.814376 ms.
[I] run[1] use 0.531791 ms.
[I] run[2] use 0.412644 ms.
[I] run[3] use 0.344734 ms.
[I] run[4] use 0.398851 ms.
[I] run[5] use 0.397379 ms.
[I] run[6] use 0.362752 ms.
[I] run[7] use 0.328668 ms.
[I] run[8] use 0.218451 ms.
[I] run[9] use 0.424869 ms.
[I] Average over 10 runs is 0.423451 ms.
&&&& FAILED TensorRT.sample_uff_mnist # ./sample_uff_mnist
root@nvidia-desktop:/opt/project/nvidia/xavier/tensorrt/bin# ./sample_uff_mnist --fp16
&&&& RUNNING TensorRT.sample_uff_mnist # ./sample_uff_mnist --fp16
[I] ../data/mnist/lenet5.uff
[I] runInFp16=[1] runInInt8=[0]
[I] run[0] use 0.932015 ms.
[I] run[1] use 0.56856 ms.
[I] run[2] use 0.553006 ms.
[I] run[3] use 0.425444 ms.
[I] run[4] use 0.449958 ms.
[I] run[5] use 0.487913 ms.
[I] run[6] use 0.527181 ms.
[I] run[7] use 0.505802 ms.
[I] run[8] use 0.4746 ms.
[I] run[9] use 0.516428 ms.
[I] Average over 10 runs is 0.544091 ms.
&&&& FAILED TensorRT.sample_uff_mnist # ./sample_uff_mnist --fp16
root@nvidia-desktop:/opt/project/nvidia/xavier/tensorrt/bin# ./sample_uff_mnist --int8
&&&& RUNNING TensorRT.sample_uff_mnist # ./sample_uff_mnist --int8
[I] ../data/mnist/lenet5.uff
[I] runInFp16=[0] runInInt8=[1]
[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[I] run[0] use 0.762366 ms.
[I] run[1] use 0.495273 ms.
[I] run[2] use 0.409698 ms.
[I] run[3] use 0.388992 ms.
[I] run[4] use 0.418018 ms.
[I] run[5] use 0.3824 ms.
[I] run[6] use 0.438916 ms.
[I] run[7] use 0.421731 ms.
[I] run[8] use 0.420898 ms.
[I] run[9] use 0.417058 ms.
[I] Average over 10 runs is 0.455535 ms.
&&&& FAILED TensorRT.sample_uff_mnist # ./sample_uff_mnist --int8

AastaLLL · July 29, 2019, 5:28am

Hi,

The order to execute the command leads to different behavior.
Please set the model into efficiency first and then lock the frequency to the maximal.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

We are still checking the reason why fp16 run slower than the fp32 mode.
Will update more information here once we find something.

Thanks.

xd_malei · July 30, 2019, 1:40am

ok, thanks

AastaLLL · October 16, 2019, 3:01am

Hi,

Sorry for keeping you waiting.
It’s recommended to use trtexec for the performance analysis rathan than sample_uff.

Thanks.

Topic		Replies	Views
No performance improvement on Jetson Nano FP16 vs FP32 TensorRT	6	2798	February 22, 2021
No performance difference between Float16 and Float32 optimized TensorRT models Jetson AGX Xavier tensorrt	4	3365	October 10, 2021
Tensorrt can not speed up well TensorRT	7	1797	June 29, 2022
On Jetson Xavier, which is faster: pseudo FP16 or true FP16? Jetson AGX Xavier tensorrt	5	579	June 29, 2022
TensorRT on TX1 with jetpack 2.3.1 FP16 mode support Jetson TX1	4	750	October 18, 2021
Int8 is not faster than fp16 on xavier Jetson AGX Xavier tensorrt	5	847	October 18, 2021
half float can't accelerate in tensorRT. Jetson TX2	2	684	October 18, 2021
Time of inference in FP16 and FP32 is the same Jetson TX2 tensorrt	20	2018	August 10, 2022
FP32 and FP16 imagenet Jetson TX2	3	927	October 18, 2021
FP16 mode is not running faster than FP32 mode TensorRT	0	966	February 11, 2019

xavier ternsorrt mnist fp16 is slower the fp32?

./sample_uff_mnist --fp16

./sample_uff_mnist

head -n 1 /etc/nv_tegra_release

R31 (release), REVISION: 1.0, GCID: 13194883, BOARD: t186ref, EABI: aarch64, DATE: Wed Oct 31 22:26:16 UTC 2018

head -n 1 /etc/nv_tegra_release

R32 (release), REVISION: 2.0, GCID: 15966166, BOARD: t186ref, EABI: aarch64, DATE: Wed Jul 17 00:26:04 UTC 2019

nvpmodel -q –-verbose

Related topics