NVIDIA Triton Inference Server decode Numpy array response


I’m setting up a Triton Server to run inference of my models. It took me some time to set everything up so my pipelines can make inference requests to the server itself, but now it seems to be working alright. However, I’m having some trouble with the InferenceResponse I got from the server.

I’m using the Tensorflow backend because the model which I’m working with is a Tensorflow model. When sending the request, there are no issues/errors/exceptions but the response I got is in a different format and size of the one I am expecting, and I’m not sure which conversion is appropriate to use (if any!). My model config file states the following regarding the output:

output [
    name: "predictions"
    data_type: TYPE_FP32
    dims: [ 1917, 17 ]
    label_filename: "labels.txt"

With the given response, if I do response.get_output('predictions') I get the following:

tensor: {'name': 'predictions', 'datatype': 'BYTES', 'shape': [4, 3], 'parameters': {'binary_data_size': 208}}

if I do response.as_numpy('predictions') I get the following (weird to me) bytes Numpy array that I don’t know how to interpret:

[[b'6.252468:9676' b'6.248357:8707' b'6.226352:10645']
 [b'6.252468:9676' b'6.248357:8707' b'6.226352:10645']
 [b'6.252468:9676' b'6.248357:8707' b'6.226352:10645']
 [b'6.252468:9676' b'6.248357:8707' b'6.226352:10645']]

Does anyone know how to properly get the expected [ 1917, 17 ] sized response?

Any feedback will be highly appreciated.