Pre-process of fp32 type inputs fed in cuDLA application

zhi_xz · February 24, 2023, 2:17am

I am learning how to program with cuDLA by cudaSamples - cuDLAHybridMode and cuDLAStandaloneMode.
First, I compile DLA loadable file by :

trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=ip2 --useDLACore=0 --int8 --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32 --saveEngine=./mnist_int8.bin --buildOnly --safe

mnist.caffemodel is a fp32 model, after being compiled by tensorrt it become int8 model. Running cuDLAStandaloneMode with mnist_int8.bin as input. It prints the input tensor descriptor as below:

Printing input tensor descriptor
        TENSOR NAME : data'
        size: 25088
        dims: [1, 1, 28, 28]
        data fmt: 3
        data type: 4
        data category: 2
        pixel fmt: 36
        pixel mapping: 0
Printing output tensor descriptor
        TENSOR NAME : ip2'
        size: 32
        dims: [1, 10, 1, 1]
        data fmt: 3
        data type: 4
        data category: 2
        pixel fmt: 36
        pixel mapping: 0

As we can see data type 4 means int8. My question is an fp32 model compiled and quantized with tensorrt, its input, and output become int8? If so, how should I handle the input since I don’t know the specific quantization parameters?

AastaLLL · February 24, 2023, 3:35am

Hi,

It’s controlled by the trtexec flag.

If you remove the --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32, the input/output format will be the default float32 type.
A data format layer will be inserted automatically to convert the float32 input/output ↔ int8 model.

Thanks.

zhi_xz · February 24, 2023, 3:43am

Thanks for your relay.
I tried to remove --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32 and tensorrt reported an error as below. It seems like inputIOFormats and outputIOFormats is required for compiling DLA loadable binary file.

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=prob --useDLACore=0 --fp16 --memPoolSize=dlaSRAM:1 --saveEngine=./mnist_fp16.bin --buildOnly --safe
[02/24/2023-11:40:38] [E] I/O formats for safe DLA capability are restricted to fp16/int8:linear, fp16:chw16 or int8:chw32

zhi_xz · February 24, 2023, 4:08am

I tried to remove the flag --safe at the same time, tensorrt compiling could pass.
But loading the loadable file with cudlaModuleLoadFromMemory returned an error mask 7.

zhi_xz · February 27, 2023, 1:14am

Genteelly ping @AastaLLL , sorry to disturb. Do you have any suggestions on my question?

AastaLLL · March 1, 2023, 7:37am

Hi,

Sorry for the late update and nonclear comment previously.

If you remove the IOFormats flag, TensorRT will insert the conversion layer automatically.
So the model will have a dependency on TensorRT and need to be compiled without a standalone flag (--safe).

If DLA standalone loadable is preferred, please follow the doc below to handle the input and output:

Thanks.

zhi_xz · March 1, 2023, 7:58am

Thank you for your reply. I have read the documentation you mentioned, but it only specifies the allowed I/O formats for DLA standalone mode. It does not explain how to convert fp32 input to int8. Could you please provide some guidance on this?

AastaLLL · March 9, 2023, 5:20am

Hi,

Sorry for keeping you waiting.

Below is the data format detail for your reference:

Thanks.

system · April 5, 2023, 1:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

ramc · April 26, 2023, 2:53pm

@zhi_xz
You can also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ