I am learning how to program with cuDLA by cudaSamples - cuDLAHybridMode and cuDLAStandaloneMode.
First, I compile DLA loadable file by :
trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=ip2 --useDLACore=0 --int8 --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32 --saveEngine=./mnist_int8.bin --buildOnly --safe
mnist.caffemodel
is a fp32
model, after being compiled by tensorrt it become int8
model. Running cuDLAStandaloneMode
with mnist_int8.bin
as input. It prints the input tensor descriptor as below:
Printing input tensor descriptor
TENSOR NAME : data'
size: 25088
dims: [1, 1, 28, 28]
data fmt: 3
data type: 4
data category: 2
pixel fmt: 36
pixel mapping: 0
Printing output tensor descriptor
TENSOR NAME : ip2'
size: 32
dims: [1, 10, 1, 1]
data fmt: 3
data type: 4
data category: 2
pixel fmt: 36
pixel mapping: 0
As we can see data type 4 means int8
. My question is an fp32 model compiled and quantized with tensorrt, its input, and output become int8? If so, how should I handle the input since I don’t know the specific quantization parameters?
Hi,
It’s controlled by the trtexec flag.
If you remove the --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32
, the input/output format will be the default float32 type.
A data format layer will be inserted automatically to convert the float32 input/output ↔ int8 model.
Thanks.
Thanks for your relay.
I tried to remove --inputIOFormats=int8:chw32 --outputIOFormats=int8:chw32
and tensorrt reported an error as below. It seems like inputIOFormats
and outputIOFormats
is required for compiling DLA loadable binary file.
&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=prob --useDLACore=0 --fp16 --memPoolSize=dlaSRAM:1 --saveEngine=./mnist_fp16.bin --buildOnly --safe
[02/24/2023-11:40:38] [E] I/O formats for safe DLA capability are restricted to fp16/int8:linear, fp16:chw16 or int8:chw32
I tried to remove the flag --safe
at the same time, tensorrt compiling could pass.
But loading the loadable file with cudlaModuleLoadFromMemory
returned an error mask 7.
Genteelly ping @AastaLLL , sorry to disturb. Do you have any suggestions on my question?
Hi,
Sorry for the late update and nonclear comment previously.
If you remove the IOFormats flag, TensorRT will insert the conversion layer automatically.
So the model will have a dependency on TensorRT and need to be compiled without a standalone flag (--safe
).
If DLA standalone loadable is preferred, please follow the doc below to handle the input and output:
Thanks.
Thank you for your reply. I have read the documentation you mentioned, but it only specifies the allowed I/O formats for DLA standalone mode. It does not explain how to convert fp32 input to int8. Could you please provide some guidance on this?
Hi,
Sorry for keeping you waiting.
Below is the data format detail for your reference:
Thanks.