Is binary/raw tensor output supported in Triton Inference Server

Hi, I need to do some postprocessing on the output tensor to get final prediction results. I could either implement it in Triton (not sure how) or have Triton return raw sensor to me.

I’ve found the following doc which describes that it’s possible:

When I configure my model output like so:

output: [
    {
        name: "Identity:0"
        data_type: TYPE_FP32
        dims: [
                3,
                256,
                512
              ]
        label_filename: ""
        parameters: {
            binary_data: true
        }
    }
]

I get this error:

[libprotobuf ERROR /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:317] Error parsing text-format inference.ModelConfig: 36:11: Message type "inference.ModelOutput" has no field named "parameters".

It seems that parameters can’t be specified for input or output. I also don’t find parameters in protobuf file:

However, I see binary_tensor_data server extension enabled in the log below:

--------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.21.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /models        

I’m running the latest docker image: nvcr.io/nvidia/tritonserver:22.04-py3 and also tried nvcr.io/nvidia/tritonserver:21.07-py3 with the same result.

Do I need to build a custom image, enable extensions or is it simply not supported? A bit confused by the docs and availability of the extension.

If this feature is not supported, is there a way to provide custom postprocessing function to be run on the triton server?

Thank you,
Alex