Got '-inf' and 'nan' result in ConvolutionLayer with TensorRT 3

I tried to manually build a network to transplant my tensorflow model to TensorRT, but the convolution layer gives wrong result : sometimes -inf, and sometimes nan.
Here is the first convolution layer info:
the input image size is: [3,256,512]
and the weight shape is: [32,3,7,7]
then the first convolution layer gives -inf result in every pixel.Even when I use np.random to generate a random weight tensor, the result does not change.

Then I use only one input channel :[1,256,512] and weight shape[32,1,7,7] then I got first 21 correct feature maps but still -inf in all other 11 feature maps.
I have tried to swap all the possible order for input, output and even weight, but it does not change anything.
Here is my code:

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
builder = trt.infer.create_infer_builder(G_LOGGER)
network = builder.create_network()

data = network.add_input("data",trt.infer.DataType.FLOAT,(INPUT_C,INPUT_H,INPUT_W))

conv1_w = graph.get_tensor_by_name('model/encoder/Conv/weights/read:0')
conv1_w = sess.run(conv1_w) # --> H W I O

kernelSize = conv1_w.shape[0]
## conv1_w = np.random.randn(kernelSize,kernelSize,3,32).astype(np.float32) ## tried 

conv1_w = np.rollaxis(conv1_w,3,0)
conv1_w = np.rollaxis(conv1_w,3,1)       
conv1_w = conv1_w.reshape(-1)
conv1_w = trt.infer.Weights(conv1_w)

conv1_b = np.zeros(num_outputMaps, dtype=np.float32).reshape(-1)
conv1_b = trt.infer.Weights(conv1_b)

Conv1=network.add_convolution(data,num_outputMaps,(kernelSize,kernelSize),conv1_w,conv1_b)
Conv1.set_stride((1,1))

Conv1.get_output(0).set_name('outTsr')
network.mark_output(Conv1.get_output(0)) #

engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()

result = np.zeros((32,(INPUT_H-6),(INPUT_W-6)),np.float32)

input = inputImage.ravel()# 
d_output = cuda.mem_alloc(result.size*result.dtype.itemsize)
d_input = cuda.mem_alloc(input.size*left.dtype.itemsize)
bindings = [int(d_input),int(d_output)]

if USE_ASYNC: 
	stream = cuda.Stream()
	cuda.memcpy_htod_async(d_input,input,stream)
	context.enqueue(1,bindings,stream.handle,None)
	cuda.memcpy_dtoh_async(result, d_output, stream)
	stream.synchronize()
else:
	cuda.memcpy_htod_async(d_input,input)
	context.execute(1,bindings)
	cuda.memcpy_dtoh(result, d_output)

By the way, once I use add_padding() or set_padding(), the result image will be twisted.
Could anyone give some suggestion?
Thanks

Hi,

You can check this example for more information:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/topics/workflows/manually_construct_tensorrt_engine.html

This example demonstrate how to manually create a TensorRT network with the weight from pyTorch model, which is similar to your use case.
Thanks.

Thanks.
The code I wrote was exactly referenced to the example you said. Now I have turned to C++ API and have got a good result for first several layers. I think the problem comes from the Python API. In my model,the Python API sometimes works well when the number of output feature maps is less than about 20, otherwise the result will be strange , it usually generates -inf in every pixel. I dont know whether it’s an API bug or the code i wrote was incorrect.

Hi,

Could you share a source to reproduce the python error with us?
Thanks.

Sure, the model file used in the python script is here:
https://drive.google.com/open?id=1yMz03smu_OQSQV6zWfwIrTsSUyNkFUVg

the python script:
https://drive.google.com/open?id=1jn24gSCTvfY_OmtlxdeDHrTrW_3l9vRy

import numpy as np
import tensorflow as tf
import cv2
import scipy.misc

import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt.utils as utils

def loadImage(imgDir,width,height):
    input_image = scipy.misc.imread(imgDir, mode="RGB")
    input_image = scipy.misc.imresize(input_image, [height, width])#, interp='lanczos')
    left = input_image.astype(np.float32) / 255
    return left

INPUT_H = 256
INPUT_W = 512
INPUT_C = 3 # 1 or 3 
INPUT_C_BIAS = 0
layerNum = 0
USE_ASYNC = True
BatchSize = 1
num_output = 32
kernel_size = 7

config = tf.ConfigProto(device_count={"CPU": 4}, 
            inter_op_parallelism_threads = 1,   
            intra_op_parallelism_threads = 1,  
            log_device_placement=True,
            allow_soft_placement=True)  
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.2

pbFilePath="/home/wifispy/MonoDepth/models/model_cityscapes_pbModel.pb"

conv1_w = None 
with tf.Session(config=config) as sess:  
    output_graph_def = tf.GraphDef()
    with open(pbFilePath,'rb') as f:
        output_graph_def.ParseFromString(f.read())
        _=tf.import_graph_def(output_graph_def,name="")
    
    graph = sess.graph
    conv1_w = graph.get_tensor_by_name('model/encoder/Conv/weights/read:0')
    conv1_w = sess.run(conv1_w) # H, W, I, O, original size : [7,7,3,32]

    conv1_w = conv1_w[:,:,0:INPUT_C,:]# --- select channels : [7,7,INPUT_C,32]
    conv1_w = np.rollaxis(conv1_w,3,0) # 
    conv1_w = np.rollaxis(conv1_w,3,1) #  O, I, H, W [32,INPUT_C,7,7]
    print 'Conv Weight Shape:',conv1_w.shape

    conv1_b = graph.get_tensor_by_name('model/encoder/Conv/biases/read:0')
    conv1_b = sess.run(conv1_b)
    conv1_b = conv1_b.astype(np.float32)  

cuda.init()
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
builder = trt.infer.create_infer_builder(G_LOGGER)
network = builder.create_network()
data = network.add_input("data",trt.infer.DataType.FLOAT,(INPUT_C,INPUT_H,INPUT_W))

# conv1_w = np.random.randn(num_output,INPUT_C,kernel_size,kernel_size).astype(np.float32) 
# conv1_w = 0.05 * np.ones((num_output,INPUT_C,kernel_size,kernel_size),np.float32)
conv1_w = conv1_w.reshape(-1)
conv1_w = trt.infer.Weights(conv1_w)

# conv1_b = np.zeros(num_output, dtype=np.float32).reshape(-1)
conv1_b = trt.infer.Weights(conv1_b)

Conv1=network.add_convolution(data,num_output,(kernel_size,kernel_size),conv1_w,conv1_b)
Conv1.set_stride((1,1))
Conv1.get_output(0).set_name('outTsr')
network.mark_output(Conv1.get_output(0)) #

engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()

inputImage = loadImage('/home/wifispy/timg.jpg',INPUT_W,INPUT_H)

inputImage = inputImage[:,:,0:INPUT_C] # --- H W C
inputTsr = np.swapaxes(inputImage,2,0)
inputTsr = np.swapaxes(inputTsr,2,1)
print 'Input Image Shape',inputTsr.shape # C H W 
inputTsr = inputTsr.ravel()

result = np.empty((num_output,INPUT_H-kernel_size+1,INPUT_W-kernel_size+1),np.float32)

d_input = cuda.mem_alloc(inputTsr.size*inputTsr.dtype.itemsize)
d_output = cuda.mem_alloc(result.size*result.dtype.itemsize)
bindings = [int(d_input),int(d_output)]

if USE_ASYNC:
    stream = cuda.Stream()
    cuda.memcpy_htod_async(d_input,inputTsr,stream)
    context.enqueue(1,bindings,stream.handle,None)
    cuda.memcpy_dtoh_async(result, d_output, stream)
    stream.synchronize()

else:
    cuda.memcpy_htod(d_input,inputTsr)
    context.execute(1,bindings)
    cuda.memcpy_dtoh(result, d_output)

d_input.free()
d_output.free()

for ctr in range(0,32):
    x = result[ctr,:,:]
    print 'result average in channel' ,ctr, x.reshape(-1).mean() # got different result image in different running time
    x = (x*255).astype(np.uint8)
    cv2.imshow(str(ctr),x)
    cv2.waitKey(500)

Hi,

Could you share what is the data format of your model? NCHW or NHWC?
Thanks.

NHWC in tensorflow. So I changed the data format to NCHW in line 82 and 83:

inputTsr = np.swapaxes(inputImage,2,0)
inputTsr = np.swapaxes(inputTsr,2,1)

I have already got the expected result with c++, and i used the same data and weight format in both c++ and python.

Thanks.

Please feel free to let us know if there is an issue to block your work.