transpose does not work in tensorrt

I covert a tensorflow model containing transpose operation to UFF. However, when I use tensorrt to import the uff, the output is not be transposed as I expect. The output is as the same as the UFF which does not contain transpose operation.

Hi,

Transpose is implemented with Shuffle layer in TensorRT.
Could you share more layer information for our reference?

Shuffle layer’s document can be found in:

Class List >> nvinfer1 >> IShuffleLayer @ /usr/share/doc/tensorrt/

Thanks.

Hi AastaLLL:
the only difference is that I add a transpose operation at the end of the graph:
pre_out = tf.reshape(pre_out, [1, side_, side_, num*deep_])
pre_out = tf.transpose(pre_out, perm=[0,2,3,1])

however, When I convert it to UFF, the output show it doesn’t transpose at all. The result is as the same as the UFF which does not contain transpose operation

Hi,

Could you share pbtxt file by executing this command:

uff.from_tensorflow(graphdef=frozen_graph,
                    output_filename=UFF_OUTPUT_FILENAME,
                    output_nodes=OUTPUT_NAMES,
                    <b>text=True</b>)

Thanks.

Hi AastaLLL:
the pbtxt file is in attachments.

the code I get the output of the uff likes below:

def infer(context, input_img, batch_size):
#load engine
engine = context.get_engine()
assert(engine.get_nb_bindings() == 2)
#create output array to receive data
dims = engine.get_binding_dimensions(1).to_DimsCHW()
elt_count = dims.C() * dims.H() * dims.W() * batch_size
#convert input data to Float32
input_img = input_img.astype(np.float32)
#Allocate pagelocked memory
output = cuda.pagelocked_empty(elt_count, dtype=np.float32)

#alocate device memory
d_input = cuda.mem_alloc(batch_size * input_img.size * input_img.dtype.itemsize)
d_output = cuda.mem_alloc(batch_size * output.size * output.dtype.itemsize)

bindings = [int(d_input), int(d_output)]

stream = cuda.Stream()

#transfer input data to device
cuda.memcpy_htod_async(d_input, input_img, stream)
#execute model
context.enqueue(batch_size, bindings, stream.handle, None)
#transfer predictions back
cuda.memcpy_dtoh_async(output, d_output, stream)

#return predictions
return output

out = infer(context, img_resized_np, 1)

Thanks
dark.uff.7z (1.83 KB)

Hi,

Thanks for your attachment.
We are checking this issue and will update information with you later.

Thanks.

Hi,

Sorry for the late reply.

After discussing, we recommend user to have NCHW format in Tensorflow to have the network working in TensorRT.
It will be tricky for handling if user uses a mixed NHWC/NCHW network

Thanks.