Trition exiting after every batch inference on cpu

Hi, I am using onnx language model in trition,
my config file content

name: “textmodel.onnx”,
platform: “onnxruntime_onnx”,
backend: “onnxruntime”,
version_policy: {
latest: {
num_versions: 1
}
},
max_batch_size: 500000,
dynamic_batching {
preferred_batch_size: [ 500000 ]
max_queue_delay_microseconds: 100
}
input: [
{
name: “token_type_ids”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
},
{
name: “attention_mask”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
},
{
name: “input_ids”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
}
],
output: [
{
name: “output_0”,
data_type: TYPE_FP32,
dims: [86],
label_filename: “”,
is_shape_tensor: false
}
],
batch_input: ,
batch_output: ,
optimization: {
priority: PRIORITY_DEFAULT,
input_pinned_memory: {
enable: true
},
output_pinned_memory: {
enable: true
},
gather_kernel_buffer_threshold: 0,
eager_batching: false
},
instance_group: [
{
name: “textmodel.onnx”,
kind: KIND_CPU,
count: 1,
gpus: ,
secondary_devices: ,
profile: ,
passive: false,
host_policy: “”
}
]

i have 500000 rows which i split down to 10k rows for inference,i am using grpcc client…

response = triton_client.async_infer(‘textmodel.onnx’,inputs,callback=partial(callback, user_data),outputs=outputs,client_timeout=120)

once all the batch is completed, trition inference server running on my cpu system exit and stop with error - (base) C:\Users\khana\Desktop\Work>python trition_grpc.py
[StatusCode.INTERNAL] onnxruntime execute failure 6: Non-zero status code returned while running DequantizeLinear node. Name:‘225_DequantizeLinear’ Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:330 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 25589760000

how to avoid it…is there any modifications i have to do…

if i send 500000 rows in one go then what modifications i have to do,so that trition inference server not be stopped and exited

Could you provide some information about your system configuration ?

@nadeemm my system config is,
windows 10
ram - 16 gb
i-7 10th gen 8 core
1650 ti gpu, but i am not using it, i am only using KIND_CPU instance group

is the config.pbtxt content above is correct based on system,if no plz share suggestions/instructions to correct it and use it in local prod, i have two onnx models both the config is same.


server.log (152.9 KB)
log file

Thanks Amir,
I shall try to find someone to help with answering questions about Triton - thanks for your patience.