[TAO] use trt of tao on tensorrt , process infer happened repeated calls

• Hardware (RTX5000 )
• Network Type (Classification )
• TLT Version ( I don’t get,but tf is 15.5)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Follow the tao doc, env get the trt file( use tao converter).
And we refer the code : tensorrt-demo/trt_infer.py at master · dkorobchenko-nv/tensorrt-demo · GitHub

We process infer happened repeated calls. For example, jpg 1 and jpg2/3/4/5/6 in this engine will infer the same output [NG2], but these pictures are different.

The code is :

    if flg in file:
        print(file)
        INPUT_IMAGE_PATH = dir_path + "/" + file
        # print(INPUT_IMAGE_PATH)
        time_start = time.time()
        out = tensorRT_v2.tensorTrt_process(INPUT_IMAGE_PATH , ENGINE_PATH,CROP_SIZE)

tensorRT_v2.tensorTrt_process refer the code tensorrt-demo/trt_infer.py at master · dkorobchenko-nv/tensorrt-demo · GitHub

Could you give us some advices on this env? Or, could you show us multiple calls example on tensorrt?

Hi,

Usually there are below ways to run inference against classification model.

  1. Run inference with “tao classification inference xxx “. This way will run inference against .tlt model.
    You can set “-d” to run inference against an images folder.

Refer to https://docs.nvidia.com/tao/tao-toolkit/text/image_classification.html#running-inference-on-a-model

  1. Run inference with deepstream. This way can run inference against .etlt model or tensorrt engine(i.e. .trt file or .engine file)

Refer to https://docs.nvidia.com/tao/tao-toolkit/text/image_classification.html#deploying-to-deepstream

Also there are some tips in this topic for reference.

https://forums.developer.nvidia.com/t/issue-with-image-classification-tutorial-and-testing-with-deepstream-app/165835/21?u=morganh

https://forums.developer.nvidia.com/t/issue-with-image-classification-tutorial-and-testing-with-deepstream-app/165835/32?u=morganh

  1. Run inference with standalone script. This way can run inference against tensort engine(i.e. .trt file or .engine file)

Refer to unofficial links https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/41 and

https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/10?u=morganh

  1. Also, users can refer to triton-tao apps. See GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton

For quick use, we think trt-infer is more efficient. Could you check the code?

def infer(img, context, stream, devide_in, devide_out, host_in, host_out ):
    bindings = [int(devide_in), int(devide_out)]
    np.copyto(host_in, img.ravel())
    cuda.memcpy_htod_async(devide_in, host_in, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(host_out, devide_out, stream)
    stream.synchronize()
    return host_out

And before the infer, the project will start

### Load TensorRT engine

trt_logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(trt_logger)
with open(ENGINE_PATH, "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

### Prepare TRT execution context, CUDA stream and necessary buffers

context = engine.create_execution_context()
stream = cuda.Stream()
host_in = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=INPUT_DATA_TYPE)
host_out = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=INPUT_DATA_TYPE)
devide_in = cuda.mem_alloc(host_in.nbytes)
devide_out = cuda.mem_alloc(host_out.nbytes)

And final we use it to infer, but get the strange result.

Full code is here:


ENGINE_PATH = 'final_trt_model.trt' # ADJUST
CLASSES = ['OK', 'NG1', 'NG2', 'NG3'] # ADJUST
CROP_SIZE = (3072, 2048) # ADJUST
INPUT_DATA_TYPE = np.float32 # ADJUST
MEASURE_TIME = True # ADJUST
CALC_VAL_ACCURACY = True # ADJUST

### Load TensorRT engine

trt_logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(trt_logger)
with open(ENGINE_PATH, "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

### Prepare TRT execution context, CUDA stream and necessary buffers

context = engine.create_execution_context()
stream = cuda.Stream()
host_in = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=INPUT_DATA_TYPE)
host_out = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=INPUT_DATA_TYPE)
devide_in = cuda.mem_alloc(host_in.nbytes)
devide_out = cuda.mem_alloc(host_out.nbytes)

### Load and prepare input image

def prepare_image(img_in, crop_size):
    # img = utils.resize_and_crop(img_in, crop_size)
    img = img_in
    img = img.astype(INPUT_DATA_TYPE)
    img = img.transpose(2, 0, 1) # to CHW
    return img

INPUT_IMAGE_PATH = 'actest0001.jpg' # ADJUST
img = imageio.imread(INPUT_IMAGE_PATH, pilmode='RGB')
img = prepare_image(img, CROP_SIZE)

### Run inference

def infer(img):
    bindings = [int(devide_in), int(devide_out)]
    np.copyto(host_in, img.ravel())
    cuda.memcpy_htod_async(devide_in, host_in, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(host_out, devide_out, stream)
    stream.synchronize()
    return host_out

out = infer(img)
print('Input : {}'.format(INPUT_IMAGE_PATH))
print('Output: {}'.format(out))
print('Prediction: {}'.format(CLASSES[np.argmax(out)]))

### Measure execution time

if MEASURE_TIME:
    import time
    TIMEIT_N_SKIP = 10 # ADJUST
    TIMEIT_N_RUN = 20 # ADJUST
    imfer_time_arr = []
    for _ in range(TIMEIT_N_SKIP):
        out = infer(img)
    for _ in range(TIMEIT_N_RUN):
        time_start = time.time()
        out = infer(img)
        imfer_time_arr.append(time.time() - time_start)
    print('Inference time: {:.3f} +- {:.3f} ms (Avg over {} runs, {} skipped)'.format(
        np.mean(imfer_time_arr)*1000.,
        np.std(imfer_time_arr)*1000.,
        TIMEIT_N_RUN, TIMEIT_N_SKIP))

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hi,
As mentioned above, can you refer to https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/41 and https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/10?u=morganh ?
Please see the code in the 1st link and the modification in the 2nd link.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

I run some training and get correct result during tensorrt engine inference.

Please set correct input_image_size as below because your training image is 3072x2048.
input_image_size: “3,2048,3072”

Also, the inference script should be refer to as mentioned above.