Using FaceDetectIR model in Triton Server

Hello. I’ve exported an engine file for the FaceDetectIR model and have it successfully working worth the exported FaceDetectIR engine. However I don’t know how to take the output and annotate an image correctly. I am using the following script:

from PIL import Image, ImageDraw
import argparse
import numpy as np
import os
from builtins import range
from tensorrtserver.api import *

FLAGS = None

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v', '--verbose', action="store_true", required=False, default=False,
                        help='Enable verbose output')
    parser.add_argument('-u', '--url', type=str, required=False, default='localhost:8000',
                        help='Inference server URL. Default is localhost:8000.')
    parser.add_argument('-i', '--protocol', type=str, required=False, default='http',
                        help='Protocol ("http"/"grpc") used to ' +
                             'communicate with inference service. Default is "http".')
    parser.add_argument('-H', dest='http_headers', metavar="HTTP_HEADER",
                        required=False, action='append',
                        help='HTTP headers to add to inference server requests. ' +
                             'Format is -H"Header:Value".')

    FLAGS = parser.parse_args()
    protocol = ProtocolType.from_str(FLAGS.protocol)

    model_name = "facedetectir"
    model_input_dims = (3, 240, 384)

    model_version = -1
    batch_size = 1

    infer_contexts = []

    # Create the inference context for the model.
    infer_ctx = InferContext(FLAGS.url, protocol, model_name, model_version,
                             http_headers=FLAGS.http_headers, verbose=FLAGS.verbose)

    # create dummy data the size of the input
    image = Image.open('data/images/gray_thermal_face.jpg')
    resized_image = image.resize((384, 240), Image.BILINEAR)
    grayscale_image = resized_image.convert('L')
    data = np.float32(np.array(grayscale_image.convert('RGB')))
    data = np.transpose(data, (2, 0, 1)) / 255

    results = infer_ctx.run({'input_1': (data,)},
                            {'output_bbox/BiasAdd': InferContext.ResultFormat.RAW,
                             'output_cov/Sigmoid': InferContext.ResultFormat.RAW},
                            batch_size)

    confidences = np.transpose(results['output_cov/Sigmoid'][0], (1, 2, 0)).reshape((15 * 24, 1))
    boxes = np.transpose(results['output_bbox/BiasAdd'][0], (1, 2, 0)).reshape((15 * 24, 4))

    draw = ImageDraw.Draw(resized_image)
    for c, b in zip(confidences, boxes):
        if c > 0.50:
            x1 = max(0, min(384, ((b[0] - (b[2] / 2)) * 384)))
            y1 = max(0, min(240, ((b[1] - (b[3] / 2)) * 240)))
            x2 = max(0, min(384, ((b[0] + (b[2] / 2)) * 384)))
            y2 = max(0, min(240, ((b[1] + (b[3] / 2)) * 240)))

            draw.rectangle(((x1, y1), (x2, y2)), outline="red")

    resized_image.save('image.jpg')

How do I annotate the bounding boxes on the image correctly?

May I know your requirement with more details?
Currently, you can write an image, and inferred bbox(x1,y1,x2,y2), right?
And just have a question on how to annotate the inferred bbox on the image?

Morgan,

I am trying to detect faces in a thermal image; here is an example:

gray_thermal_face

When I use the script I gave above the bboxes are not drawn around the face. I am unsure of what the output from the model is. Based on what I read from the docs (https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir) the output is a normalised xc, yc, w, h. However this does not appear to work at all. Could you please advise on how to use the FaceDetectIR model correctly with Triton? Perhaps my pre/post processing is incorrect.

If you already generated the trt engine, please consider using “tlt-infer” to run inference against the trt engine firstly.
You can trigger jupyter notebook to see the detectnet_v2 example.

10. Verify Deployed Model

Verify the exported model by visualizing inferences on TensorRT. In addition to running inference on a .tlt model in step 8, the tlt-infer tool is also capable of consuming the converted TensorRT engine from step 9.B.

I have reviewed the Juypter Notebook and have converted to an TensorRT engine. I am asking on how to use the raw output of the model (which is given to me by Triton) to create bounding box coordinates in x1,y1,x2,y2. The 24x15x4 bbox coordinate tensor and 24x15x1 class confidence tensor is an unfamiliar format and I have been unable to convert it to a format I understand.

I suggest you run “tlt-infer” against the etlt model firstly. It will generate bbox on your test image. And also it will show the labels. If it can work, you will narrow down your current problem.

I ran tlt-infer and it detects the face properly. So the model is not the issue. The output label is of no use. It gives the correct label and coordinates. But nowhere does it show me how to take the raw output of the DetectNetV2 model and create x1y1,x2,y2 coordinates myself with Triton server.

Hope this link can help you. https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/master/README.md.

Detectnet_v2

The model has the following two outputs:

  • output_cov/Sigmoid : A [batchSize, Class_Num, gridcell_h, gridcell_w] tensor contains the number of gridcells that are covered by an object
  • output_bbox/BiasAdd : a [batchSize, Class_Num, 4] contains the normalized image coordinates of the object (x1, y1) top left and (x2, y2) bottom right with respect to the grid cell

More, please refer to postprocess code which is exposed in C++ in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp too.

1 Like
  • output_bbox/BiasAdd : a [batchSize, Class_Num, 4] contains the normalized image coordinates of the object (x1, y1) top left and (x2, y2) bottom right with respect to the grid cell

In my opinion, It shape is wrong, the proper shape must be :

[batchSize, Calss_Num*4, gridcell_h, gridcell_w]

because in that output we must to have gridcell_h*gridcell_w boxes.

Hi @Morganh
when I run tlt-infer for tlt, it’s work but for etlt I get this error:

2020-06-22 13:00:55,383 [INFO] iva.detectnet_v2.scripts.inference: Overlain images will be saved in the output path.
2020-06-22 13:00:55,383 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer
2020-06-22 13:00:55,581 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Reading from engine file at: /workspace/tmp2/experiment_dir_final/resnet18_detector.etlt
[TensorRT] ERROR: ../rtSafe/coreReadArchive.cpp (31) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-infer", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_infer.py", line 56, in main
  File "./detectnet_v2/scripts/inference.py", line 194, in main
  File "./detectnet_v2/scripts/inference.py", line 117, in inference_wrapper_batch
  File "./detectnet_v2/inferencer/trt_inferencer.py", line 380, in network_init
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

I run this command in TLT 2.0:

tlt-infer detectnet_v2 -e /workspace/tmp2/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt.txt \
                        -o /workspace/tmp2/output \
                        -i /workspace/tmp2/trainval/image \
                        -k KEY

@LoveNvidia
Please create a new topic for your issue of “tlt-infer”. Thanks.