Pre-process of fp32 type inputs fed in cuDLA application

I am learning how to program with cuDLA by cudaSamples - cuDLAHybridMode and cuDLAStandaloneMode.
First, I compile DLA loadable file by :

trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=ip2 --useDLACore=0 --int8 --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32 --saveEngine=./mnist_int8.bin --buildOnly --safe

mnist.caffemodel is a fp32 model, after being compiled by tensorrt it become int8 model. Running cuDLAStandaloneMode with mnist_int8.bin as input. It prints the input tensor descriptor as below:

Printing input tensor descriptor
        TENSOR NAME : data'
        size: 25088
        dims: [1, 1, 28, 28]
        data fmt: 3
        data type: 4
        data category: 2
        pixel fmt: 36
        pixel mapping: 0
Printing output tensor descriptor
        TENSOR NAME : ip2'
        size: 32
        dims: [1, 10, 1, 1]
        data fmt: 3
        data type: 4
        data category: 2
        pixel fmt: 36
        pixel mapping: 0

As we can see data type 4 means int8. My question is an fp32 model compiled and quantized with tensorrt, its input, and output become int8? If so, how should I handle the input since I don’t know the specific quantization parameters?

Hi,

It’s controlled by the trtexec flag.

If you remove the --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32, the input/output format will be the default float32 type.
A data format layer will be inserted automatically to convert the float32 input/output ↔ int8 model.

Thanks.

Thanks for your relay.
I tried to remove --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32 and tensorrt reported an error as below. It seems like inputIOFormats and outputIOFormats is required for compiling DLA loadable binary file.

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=prob --useDLACore=0 --fp16 --memPoolSize=dlaSRAM:1 --saveEngine=./mnist_fp16.bin --buildOnly --safe
[02/24/2023-11:40:38] [E] I/O formats for safe DLA capability are restricted to fp16/int8:linear, fp16:chw16 or int8:chw32

I tried to remove the flag --safe at the same time, tensorrt compiling could pass.
But loading the loadable file with cudlaModuleLoadFromMemory returned an error mask 7.

Genteelly ping @AastaLLL , sorry to disturb. Do you have any suggestions on my question?

Hi,

Sorry for the late update and nonclear comment previously.

If you remove the IOFormats flag, TensorRT will insert the conversion layer automatically.
So the model will have a dependency on TensorRT and need to be compiled without a standalone flag (--safe).

If DLA standalone loadable is preferred, please follow the doc below to handle the input and output:

Thanks.

Thank you for your reply. I have read the documentation you mentioned, but it only specifies the allowed I/O formats for DLA standalone mode. It does not explain how to convert fp32 input to int8. Could you please provide some guidance on this?

Hi,

Sorry for keeping you waiting.

Below is the data format detail for your reference:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

@zhi_xz
You can also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ