Hi,
I just use tensorRT instead of caffe framework. And I have written a plugin for PReLU, and when I test fp32 and fp16. the speed was not much different.
Thanks.
Hi,
I just use tensorRT instead of caffe framework. And I have written a plugin for PReLU, and when I test fp32 and fp16. the speed was not much different.
Thanks.
Hi,
Could you share more information about your use case?
Here are some initial suggestions:
1. Please remember to maximize the TX2 performance first.
sudo ./jetson_clocks.sh
2. It’s recommended to use TensorRT profiler to figure out the bottleneck layer.
Please check our native sample for information:
/usr/src/tensorrt/samples/sampleGoogleNet/sampleGoogleNet.cpp
3. FP16 cuts memory in half but not always double the performance.
The time to process a specific layer (Ex. IP layer) may be longer in FP16 mode.
It is encouraged to compare the performance between FP16 and float.
Thanks.