I tried to manually build a network to transplant my tensorflow model to TensorRT, but the convolution layer gives wrong result : sometimes -inf, and sometimes nan.
Here is the first convolution layer info:
the input image size is: [3,256,512]
and the weight shape is: [32,3,7,7]
then the first convolution layer gives -inf result in every pixel.Even when I use np.random to generate a random weight tensor, the result does not change.
Then I use only one input channel :[1,256,512] and weight shape[32,1,7,7] then I got first 21 correct feature maps but still -inf in all other 11 feature maps.
I have tried to swap all the possible order for input, output and even weight, but it does not change anything.
Here is my code:
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
builder = trt.infer.create_infer_builder(G_LOGGER)
network = builder.create_network()
data = network.add_input("data",trt.infer.DataType.FLOAT,(INPUT_C,INPUT_H,INPUT_W))
conv1_w = graph.get_tensor_by_name('model/encoder/Conv/weights/read:0')
conv1_w = sess.run(conv1_w) # --> H W I O
kernelSize = conv1_w.shape[0]
## conv1_w = np.random.randn(kernelSize,kernelSize,3,32).astype(np.float32) ## tried
conv1_w = np.rollaxis(conv1_w,3,0)
conv1_w = np.rollaxis(conv1_w,3,1)
conv1_w = conv1_w.reshape(-1)
conv1_w = trt.infer.Weights(conv1_w)
conv1_b = np.zeros(num_outputMaps, dtype=np.float32).reshape(-1)
conv1_b = trt.infer.Weights(conv1_b)
Conv1=network.add_convolution(data,num_outputMaps,(kernelSize,kernelSize),conv1_w,conv1_b)
Conv1.set_stride((1,1))
Conv1.get_output(0).set_name('outTsr')
network.mark_output(Conv1.get_output(0)) #
engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()
result = np.zeros((32,(INPUT_H-6),(INPUT_W-6)),np.float32)
input = inputImage.ravel()#
d_output = cuda.mem_alloc(result.size*result.dtype.itemsize)
d_input = cuda.mem_alloc(input.size*left.dtype.itemsize)
bindings = [int(d_input),int(d_output)]
if USE_ASYNC:
stream = cuda.Stream()
cuda.memcpy_htod_async(d_input,input,stream)
context.enqueue(1,bindings,stream.handle,None)
cuda.memcpy_dtoh_async(result, d_output, stream)
stream.synchronize()
else:
cuda.memcpy_htod_async(d_input,input)
context.execute(1,bindings)
cuda.memcpy_dtoh(result, d_output)
By the way, once I use add_padding() or set_padding(), the result image will be twisted.
Could anyone give some suggestion?
Thanks