[TAO] use trt of tao on tensorrt , process infer happened repeated calls

devin.he · October 9, 2022, 4:45am

• Hardware (RTX5000 )
• Network Type (Classification )
• TLT Version ( I don’t get，but tf is 15.5)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Follow the tao doc, env get the trt file( use tao converter).
And we refer the code : tensorrt-demo/trt_infer.py at master · dkorobchenko-nv/tensorrt-demo · GitHub

We process infer happened repeated calls. For example, jpg 1 and jpg2/3/4/5/6 in this engine will infer the same output [NG2]， but these pictures are different.

The code is :

    if flg in file:
        print(file)
        INPUT_IMAGE_PATH = dir_path + "/" + file
        # print(INPUT_IMAGE_PATH)
        time_start = time.time()
        out = tensorRT_v2.tensorTrt_process(INPUT_IMAGE_PATH , ENGINE_PATH,CROP_SIZE)

tensorRT_v2.tensorTrt_process refer the code tensorrt-demo/trt_infer.py at master · dkorobchenko-nv/tensorrt-demo · GitHub

Could you give us some advices on this env? Or, could you show us multiple calls example on tensorrt?

Morganh · October 9, 2022, 7:31am

Hi,

Usually there are below ways to run inference against classification model.

Run inference with “tao classification inference xxx “. This way will run inference against .tlt model.
You can set “-d” to run inference against an images folder.

Refer to https://docs.nvidia.com/tao/tao-toolkit/text/image_classification.html#running-inference-on-a-model

Run inference with deepstream. This way can run inference against .etlt model or tensorrt engine(i.e. .trt file or .engine file)

Refer to https://docs.nvidia.com/tao/tao-toolkit/text/image_classification.html#deploying-to-deepstream

Also there are some tips in this topic for reference.

https://forums.developer.nvidia.com/t/issue-with-image-classification-tutorial-and-testing-with-deepstream-app/165835/21?u=morganh

https://forums.developer.nvidia.com/t/issue-with-image-classification-tutorial-and-testing-with-deepstream-app/165835/32?u=morganh

Run inference with standalone script. This way can run inference against tensort engine(i.e. .trt file or .engine file)

Refer to unofficial links https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/41 and

https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/10?u=morganh

Also, users can refer to triton-tao apps. See GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton

devin.he · October 9, 2022, 8:44am

For quick use， we think trt-infer is more efficient. Could you check the code?

def infer(img, context, stream, devide_in, devide_out, host_in, host_out ):
    bindings = [int(devide_in), int(devide_out)]
    np.copyto(host_in, img.ravel())
    cuda.memcpy_htod_async(devide_in, host_in, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(host_out, devide_out, stream)
    stream.synchronize()
    return host_out

And before the infer, the project will start

### Load TensorRT engine

trt_logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(trt_logger)
with open(ENGINE_PATH, "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

### Prepare TRT execution context, CUDA stream and necessary buffers

context = engine.create_execution_context()
stream = cuda.Stream()
host_in = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=INPUT_DATA_TYPE)
host_out = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=INPUT_DATA_TYPE)
devide_in = cuda.mem_alloc(host_in.nbytes)
devide_out = cuda.mem_alloc(host_out.nbytes)

And final we use it to infer, but get the strange result.

Full code is here:


ENGINE_PATH = 'final_trt_model.trt' # ADJUST
CLASSES = ['OK', 'NG1', 'NG2', 'NG3'] # ADJUST
CROP_SIZE = (3072, 2048) # ADJUST
INPUT_DATA_TYPE = np.float32 # ADJUST
MEASURE_TIME = True # ADJUST
CALC_VAL_ACCURACY = True # ADJUST

### Load TensorRT engine

trt_logger = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(trt_logger)
with open(ENGINE_PATH, "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

### Prepare TRT execution context, CUDA stream and necessary buffers

context = engine.create_execution_context()
stream = cuda.Stream()
host_in = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=INPUT_DATA_TYPE)
host_out = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=INPUT_DATA_TYPE)
devide_in = cuda.mem_alloc(host_in.nbytes)
devide_out = cuda.mem_alloc(host_out.nbytes)

### Load and prepare input image

def prepare_image(img_in, crop_size):
    # img = utils.resize_and_crop(img_in, crop_size)
    img = img_in
    img = img.astype(INPUT_DATA_TYPE)
    img = img.transpose(2, 0, 1) # to CHW
    return img

INPUT_IMAGE_PATH = 'actest0001.jpg' # ADJUST
img = imageio.imread(INPUT_IMAGE_PATH, pilmode='RGB')
img = prepare_image(img, CROP_SIZE)

### Run inference

def infer(img):
    bindings = [int(devide_in), int(devide_out)]
    np.copyto(host_in, img.ravel())
    cuda.memcpy_htod_async(devide_in, host_in, stream)
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(host_out, devide_out, stream)
    stream.synchronize()
    return host_out

out = infer(img)
print('Input : {}'.format(INPUT_IMAGE_PATH))
print('Output: {}'.format(out))
print('Prediction: {}'.format(CLASSES[np.argmax(out)]))

### Measure execution time

if MEASURE_TIME:
    import time
    TIMEIT_N_SKIP = 10 # ADJUST
    TIMEIT_N_RUN = 20 # ADJUST
    imfer_time_arr = []
    for _ in range(TIMEIT_N_SKIP):
        out = infer(img)
    for _ in range(TIMEIT_N_RUN):
        time_start = time.time()
        out = infer(img)
        imfer_time_arr.append(time.time() - time_start)
    print('Inference time: {:.3f} +- {:.3f} ms (Avg over {} runs, {} skipped)'.format(
        np.mean(imfer_time_arr)*1000.,
        np.std(imfer_time_arr)*1000.,
        TIMEIT_N_RUN, TIMEIT_N_SKIP))

Morganh · October 10, 2022, 6:01am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hi,
As mentioned above, can you refer to https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/41 and https://forums.developer.nvidia.com/t/inferring-resnet18-classification-etlt-model-with-python/167721/10?u=morganh ?
Please see the code in the 1st link and the modification in the 2nd link.

system · October 25, 2022, 5:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Morganh · November 7, 2022, 6:47am

I run some training and get correct result during tensorrt engine inference.

Please set correct input_image_size as below because your training image is 3072x2048.
input_image_size: “3,2048,3072”

Also, the inference script should be refer to as mentioned above.

Topic		Replies	Views
Running nvidia pretrained models in Tensorrt inference TAO Toolkit	14	902	October 6, 2022
Classification inference huge performance degradation TAO Toolkit	11	1524	February 18, 2022
Help on python Tensorrt Inference for yolov4_tiny model trained on custom dataset TAO Toolkit tensorrt , yolo , tao	3	332	March 25, 2024
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	600	June 29, 2022
Falure to do inference TAO Toolkit tensorrt	9	1070	January 11, 2022
Run inference on a batch of images & parallel inference using cuda on python threads TensorRT tensorrt , cuda	6	2224	January 6, 2022
Tensorrt Batch Inference TensorRT tensorrt	8	1565	December 1, 2020
Converting etlt file to .engine for jetson TAO Toolkit	17	2808	October 25, 2022
TensorRT Inference form a .etlt model on Python TAO Toolkit tensorrt	7	1205	November 16, 2021
Inferring resnet18 classification etlt model with python TAO Toolkit	45	3982	October 12, 2021

[TAO] use trt of tao on tensorrt , process infer happened repeated calls

Related topics