Same memory usage for fp16 and int8

tigran.khachatryann1 · July 11, 2021, 3:23pm

Hello,

I wanted to benchmark depth estimation model on Jetson Xavier NX in terms of speed and memory usage. For that purpose I have converted pytorch model to ONNX format and than I have created TensorRT engines with fp32, fp16 and int8 precisions. In case of speed(FPS) everything seems to be correct, fp16 model is faster than fp32 and int8 model is the fastest.
The memory usage is around 1.9 Gb in case of fp32 and around 1.1 Gb in case of fp16 and int8. I guess the difference between fp32 and fp16 memory usages is reasonable, but I can not understand why it is similar for fp16 and int8 engines.
Could someone explain is this behavior correct?
Could you please advise how can I profile memory usage ? (My application is written in Python)
Is there any method to calculate FLOPs or TOPs for TensorRT engine?

Thanks,
Tigran

AastaLLL · July 12, 2021, 4:10am

Hi,

The memory usage depends on the inference algorithm used by TensorRT.
It doesn’t guarantee that lower precision will use less memory.

But you can specify the preferred memory amount when creating the TensorRT engine.
For example:

$ /usr/src/tensorrt/bin/trtexec --workspace=16 ...

To get FLOPs information, you can use nvprof with --metrics flag.
For example:

$ sudo /usr/local/cuda-10.2/bin/nvprof --metrics flops_sp /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx

==30550== Profiling result:
==30550== Metric result:
Invocations                               Metric Name                            Metric Description         Min         Max         Avg
Device "Xavier (0)"
    Kernel: void fused::fusedConvolutionReluKernel<fused::SrcChwcPtr_FltTex_Reader<float, int=1, int=1, int=1, int=1>, fused::KpqkPtrWriter<float, int=1, int=1, int=1>, float, float, int=3, int=4, int=1, int=5, int=5, int=1, int=1>(fused::ConvolutionParams<floatSrcType, int=1, int=1Type>)
         80                             flop_count_sp   Floating Point Operations(Single Precision)     1350272     3075136     2212704
    Kernel: void fused::fusedConvolutionReluKernel<fused::SrcChwcPtr_FltTex_Reader<float, int=1, int=1, int=1, int=2>, fused::KpqkPtrWriter<float, int=1, int=1, int=2>, float, float, int=2, int=4, int=1, int=5, int=5, int=1, int=1>(fused::ConvolutionParams<floatSrcType, int=1, int=1Type>)
         80                             flop_count_sp   Floating Point Operations(Single Precision)     2515072     5737536     4126304
    Kernel: void nvinfer1::tiled_pooling::poolCHW_PQT<int=2, int=2, int=2, int=2, int=2, int=16, int=128, int=1, int=1, bool=1, nvinfer1::ITiledPooling::PoolingMode, bool=0>(nvinfer1::TiledPoolingParams, int)
...

Thanks.

tigran.khachatryann1 · July 13, 2021, 10:27pm

Hi,

Thank you for explanation @AastaLLL .
One more question please, how can I get the amount of int8 operations?
I was able to measure the amount of fp32 and fp16 operations using flops_sp and flop_count_hp metrics respectively, but I can not find any metric for int8 operations.

Thanks.

AastaLLL · July 22, 2021, 8:16am

Hi,

You can use tensor_int_fu_utilization mentioned in the below document:

Thanks.

system · September 27, 2021, 1:06pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2879	October 18, 2021
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX TensorRT	1	520	July 8, 2021
TensorRT Inferencing using TF-TRT framework FP32 vs FP16 Jetson AGX Orin tensorrt	6	336	June 3, 2024
TRT inference fp32 vs fp16 TensorRT	4	2670	June 17, 2020
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	886	February 12, 2024
Why inference in jetson nano with fp16 is slower than fp32 Jetson Nano tensorrt , jetson-inference	9	1949	September 5, 2021
Same inference speed for INT8 and FP16 TensorRT	10	5840	October 12, 2021
Tensorrt Engine use too much memory TensorRT tensorrt	1	1609	December 13, 2021
How to calculate TOPS (INT8) or TFLOPS (FP16) of each layer of a CNN using TensorRT Jetson AGX Xavier tensorrt	7	12035	September 12, 2021
TensorRT int8 performance Jetson AGX Xavier	4	1231	October 18, 2021

Same memory usage for fp16 and int8

Related topics