TensorRT 5 UFFParser error about kernel weights

i’m trying to convert and run inference with a Tensorflow model.

I generate a Tensorflow model with tf.keras:

model = Sequential()
model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(224,224,3), name='image'))
model.add(Conv2D(32, kernel_size=3, activation='relu'))
model.add(Dense(num_classes, activation='softmax', name='output'))

i train the model then i convert the file saving a .pb tensorflow frozen graph file, converted to UFF with:

uff_model = uff.from_tensorflow_frozen_model(

When i parse the model with:

TRT_LOGGER = trt.Logger(trt.Logger.INFO)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.UffParser()
parser.register_input("image_input", (1, 224, 224, 3))
parser.parse("model/tensorrt/simplemodel.uff", network)

i got this error:

[TensorRT] INFO: UFFParser: parsing image_input
[TensorRT] INFO: UFFParser: parsing image/kernel
[TensorRT] INFO: UFFParser: parsing image/Conv2D
[TensorRT] INFO: UFFParser: parsing image/bias
[TensorRT] INFO: UFFParser: parsing image/BiasAdd
[TensorRT] ERROR: image/Conv2D: kernel weights has count 1728 but 129024 was expected
[TensorRT] ERROR: UFFParser: Parser error: image/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.

Why the error about kernel weights count ? I set the input shape with correct size (224,244,3…), this is my model:

Layer (type)                 Output Shape              Param #   
image (Conv2D)               (None, 222, 222, 64)      1792      
conv2d (Conv2D)              (None, 220, 220, 32)      18464     
flatten (Flatten)            (None, 1548800)           0         
output (Dense)               (None, 6)                 9292806   
Total params: 9,313,062
Trainable params: 9,313,062
Non-trainable params: 0


You can check your trt version with trt.version firstly.
If it is or something below, then it maybe the problem of input register.
Change from
parser.register_input(“image_input”, (1, 224, 224, 3))
parser.register_input(“image_input”, (3, 224, 224))
This is because it always requires CHW no matter which format you use in your model.

Thanks, now the parser is working !

But if i try to inference an image i got very low accuracy:

img = image.load_img(img_path, target_size=(224,224)) # Shape (224,224,3)
input_data = image.img_to_array(img)
input_data = input_data.astype(np.float32)
input_data = input_data.transpose(2, 1, 0) # shape(3,224,224)
np.copyto(h_input, input_data.ravel())
cuda.memcpy_htod_async(d_input, h_input, stream)
context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(h_output, d_output, stream)

PS: whith tf-trt create_inference_graph i got a TensorRT optimized graph that works without prediction errors.


Well,I`m trying to solve similar problems.
I got totaly meaningless result firstly.Now it works with a transpose operation you used.
So thanks,too.

def build_engine(model,input):
    trt_logger = trt.Logger(trt.Logger.INFO)
    #trt_logger = trt.Logger(trt.Logger.VERBOSE)
#     trt.init_libnvinfer_plugins(trt_logger, '')
    # Initialize runtime needed for loading TensorRT engine from file
#     trt_runtime = trt.Runtime(trt_logger)
    # TRT engine placeholder
    trt_engine = None
    trt_engine_datatype = trt.DataType.FLOAT
    batch_size = 1
    silent = False
    with trt.Builder(trt_logger) as builder, builder.create_network() as network, trt.UffParser() as parser:
        builder.max_workspace_size = 1 << 20
        if trt_engine_datatype == trt.DataType.HALF:
            builder.fp16_mode = True
        builder.max_batch_size = batch_size
        parser.register_input(input, (3, 300, 300),trt.UffInputOrder.NHWC)
        parser.parse(model, network)
        engine = builder.build_cuda_engine(network)
        return engine
def do_inference(context,bindings,inputs,outputs,stream):
        for output in outputs:
        return outputs
data_path ='data/cat1.jpg'
stream = cuda.Stream()
bindings = []
outputs =[]
for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding))*engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        host = cuda.pagelocked_empty(size,dtype)
        device = cuda.mem_alloc(host.nbytes)
#         print(len(host))
                inputs = (host,device)
context = engine.create_execution_context()
img = cv2.resize(np.array(Image.open(data_path)),(input_H,input_W))
img_mean = img - mean_color
img_trans = img_mean.transpose(2, 1, 0)
img_flat = img_trans.ravel()
t = time.time()
outputs = do_inference(context,bindings,inputs,outputs,stream)
loc,conf = outputs[0][0],outputs[1][0]
print('Conf:{0} \nConf Num:{1}\nLoc: {2} \nLoc Num: {3} '.format(conf,len(conf),loc,len(loc)))
print("test_num: "+str(test_num))

Ridiculously, I registered input with NHWC format,but only when i feed input with CHW format it works.I found that your code used register with default format (which should be NCHW) but your code has already used a transpose operation! So it should be correct!
And don`t forget that if you use tenosrrt to accelerate, it will lose some accuracy.

I’ve tried with trt.UffInputOrder.NHWC and the same transpose op, but i get too many errors (only 12% accuracy), if i change the UffInputOrder or the transpose op the accuracy is still around 10-15%
With original tensorflow model i get 100% accuracy on validation set (is a very simple dataset, with circles and squares…)