Unable to load parse onnx network with int8 operations

Description

When parsing a network containing int8 input, the parser fails to parse any subsequent int8 operations. I’ve added an overview of the network, while the full onnx file is also attached. The input is int8, while the cast converts to float32. I’d like to know why the parser considers this invalid. Note that passing int8 input and immediately casting works fine.

I’ve been digging into the tensorrt support matrix, The IElementWiseLayer does not support int8 precision, which is probably why my onnx model fails to parse. Can someone shed some light onto why operators like 2D convolution support int8, while the most basic elementwise operators don’t?

Screenshot from 2021-03-10 14-45-33

Details

The second layer sub fails to parse, complainging about int8 being an invalid weight type. Snippet of relevant logs using ./trtexec --onnx="./intTest.onnx" --verbose :

03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:107: Parsing node: PartitionedCall/sub [Sub]
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:123: Searching for input: input:0
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:123: Searching for input: PartitionedCall/sub/y:0
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:129: PartitionedCall/sub [Sub] inputs: [input:0 -> (1024, 3)], [PartitionedCall/sub/y:0 -> ()], 
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [W] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
While parsing node number 0 [Sub -> "PartitionedCall/sub:0"]:
--- Begin node ---
input: "input:0"
input: "PartitionedCall/sub/y:0"
output: "PartitionedCall/sub:0"
name: "PartitionedCall/sub"
op_type: "Sub"

--- End node ---
ERROR: /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:673 In function elementwiseHelper:
[8] Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && "Failed to broadcast tensors elementwise!"
[03/10/2021-13:22:06] [E] Failed to parse onnx file

Environment

TensorRT Version: 7.1.3
GPU Type:
Nvidia Driver Version: 450.102.04
CUDA Version: 11.0.3
CUDNN Version: 8.0.4
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.09-py3

Relevant Files

intTest.onnx (641 Bytes)

Steps To Reproduce

  1. Start the nvcr.io/nvidia/tensorrt:20.09-py3 container, mounting a folder where the attached intTest.onnx file is placed.
  2. go to the tensorrt/bin folder and run trtexec --onnx="mounted_path/intTest.onnx" --verbose

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi, the onnx model is already added under the relevant files section.

I’ve tried out the check_model.py steps, check_model doesn’t return anything (assuming that to be a good sign).

I probably didn’t make it very clear in the post, but i already found that int8 subtraction is not supported by tensorRT, after looking at support matrix. If remove the subtraction, parsing works. I’m still left with the question of why some more advanced operations such as 2d convolutions support int8, but element wise operations like subtraction don’t

Hi @kristof1,

Based on the error you’ve shared, “[E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8”,
We do not support INT8 weights because TRT performs the weights quantization itself.

Thank you.

Hi, thank you for the reply. At this moment i don’t want to apply quantization at all, i merely want to pass my data as int8 to reduce transfer times and avoid having to copy data on the cpu.

On the cpu i have uint8 data. Since TensorRT does not support unsigned types, i have to make sure that the int8 data is interpreted correctly. Subtracting 128 in int8 arithmetic (assuming 2-complement), then casting to float32 and adding 128 again gives me the same (after division by 255) as if i were to convert my uint8 data to float32 on the cpu and pass that, with the benefit of avoiding the cpu conversion + having to copy 4 times less data from cpu to gpu.

If i move the subtraction out of the model and do it on the cpu everything works fine. However for my application i can’t modify the original data so i have to make a copy before doing the subtraction, which is why i wanted to integrate it into the model.

1 Like

Hi @kristof1,

Sorry for delayed response.
Currently we see that you are not preprocessing the data correctly. You have uint8 data and want to minus 128, but the input is already int8_t so the data has already been truncated. Your model will not work accordingly.
Ideal solution should be do first sub 128 on your own cpu code. And create following network
Input ------------ Add ----…
Constatnt 128 -/
This 128 should be float. Then enable int8 and only set input’s dynamic range to [-127, 127] TensorRT will automatically convert data, you will not need do any additional cast.

Thank you.

Hi @spolisetty , thank you for revisiting my question.

I did not immediately see the problem with my workflow, so i wrote a step by step test in python to verify that my procedure gives the correct result under 2-complement.

If you compare the result in the end, it is the same as directly converting uint8 to float32. As TensorRT does not know unsigned types and interprets my copied bytes as int8, i have to take this workaround.

import numpy as np

uint8_data = np.arange(0,256,dtype=np.uint8)
print("uint8 range {}".format(uint8_data))
print("Directly converting uint8 to float {}".format(uint8_data.astype(np.float32)))
int8_data = uint8_data.astype(np.int8) 
print("uint8 interpreted as int8 {}".format(int8_data))

shifted_int8 = int8_data
shifted_int8 -= 128
print("subtract 128 in int8 {}".format(shifted_int8))

print("Now convert to float and then add 128 {}".format(shifted_int8.astype(np.float32) + 128.0))

for which i get the output

kristof@kristof-XPS-15-9570:~/optimum/cloudsorter_6d$ python3 test.py
uint8 range [  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
 252 253 254 255]
Directly converting uint8 to float [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
  14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
  28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
  42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
  56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
  70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
  84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
  98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139.
 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153.
 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167.
 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181.
 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195.
 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209.
 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223.
 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237.
 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251.
 252. 253. 254. 255.]
uint8 interpreted as int8 [   0    1    2    3    4    5    6    7    8    9   10   11   12   13
   14   15   16   17   18   19   20   21   22   23   24   25   26   27
   28   29   30   31   32   33   34   35   36   37   38   39   40   41
   42   43   44   45   46   47   48   49   50   51   52   53   54   55
   56   57   58   59   60   61   62   63   64   65   66   67   68   69
   70   71   72   73   74   75   76   77   78   79   80   81   82   83
   84   85   86   87   88   89   90   91   92   93   94   95   96   97
   98   99  100  101  102  103  104  105  106  107  108  109  110  111
  112  113  114  115  116  117  118  119  120  121  122  123  124  125
  126  127 -128 -127 -126 -125 -124 -123 -122 -121 -120 -119 -118 -117
 -116 -115 -114 -113 -112 -111 -110 -109 -108 -107 -106 -105 -104 -103
 -102 -101 -100  -99  -98  -97  -96  -95  -94  -93  -92  -91  -90  -89
  -88  -87  -86  -85  -84  -83  -82  -81  -80  -79  -78  -77  -76  -75
  -74  -73  -72  -71  -70  -69  -68  -67  -66  -65  -64  -63  -62  -61
  -60  -59  -58  -57  -56  -55  -54  -53  -52  -51  -50  -49  -48  -47
  -46  -45  -44  -43  -42  -41  -40  -39  -38  -37  -36  -35  -34  -33
  -32  -31  -30  -29  -28  -27  -26  -25  -24  -23  -22  -21  -20  -19
  -18  -17  -16  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5
   -4   -3   -2   -1]
subtract 128 in int8 [-128 -127 -126 -125 -124 -123 -122 -121 -120 -119 -118 -117 -116 -115
 -114 -113 -112 -111 -110 -109 -108 -107 -106 -105 -104 -103 -102 -101
 -100  -99  -98  -97  -96  -95  -94  -93  -92  -91  -90  -89  -88  -87
  -86  -85  -84  -83  -82  -81  -80  -79  -78  -77  -76  -75  -74  -73
  -72  -71  -70  -69  -68  -67  -66  -65  -64  -63  -62  -61  -60  -59
  -58  -57  -56  -55  -54  -53  -52  -51  -50  -49  -48  -47  -46  -45
  -44  -43  -42  -41  -40  -39  -38  -37  -36  -35  -34  -33  -32  -31
  -30  -29  -28  -27  -26  -25  -24  -23  -22  -21  -20  -19  -18  -17
  -16  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3
   -2   -1    0    1    2    3    4    5    6    7    8    9   10   11
   12   13   14   15   16   17   18   19   20   21   22   23   24   25
   26   27   28   29   30   31   32   33   34   35   36   37   38   39
   40   41   42   43   44   45   46   47   48   49   50   51   52   53
   54   55   56   57   58   59   60   61   62   63   64   65   66   67
   68   69   70   71   72   73   74   75   76   77   78   79   80   81
   82   83   84   85   86   87   88   89   90   91   92   93   94   95
   96   97   98   99  100  101  102  103  104  105  106  107  108  109
  110  111  112  113  114  115  116  117  118  119  120  121  122  123
  124  125  126  127]
Now convert to float and then add 128 [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
  14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
  28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
  42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
  56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
  70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
  84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
  98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139.
 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153.
 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167.
 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181.
 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195.
 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209.
 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223.
 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237.
 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251.
 252. 253. 254. 255.]

Now, i can do the minus 128 in int8 on the cpu (which i do, and everything works exactly as if i would pass float32 data and start from there), but it requires me to copy stuff on the cpu side as i can’t alter the original data. I would also find it a bit cleaner if this was hidden from client code.

Is the dynamic range really necessary? if the input is already int8 and in the range of [-128,127] there really isn’t any quantization step that needs to be performed. Is something being executed even though the input is already int8?

Hi @kristof1,

If we specify int8 for input, then we must use dynamic range. Or else TensorRT won’t be able to generate the engine. Our int8 is not designed for such usage.

Thank you.