Trition exiting after every batch inference on cpu

amirkhan · July 26, 2021, 9:03am

Hi, I am using onnx language model in trition,
my config file content

name: “textmodel.onnx”,
platform: “onnxruntime_onnx”,
backend: “onnxruntime”,
version_policy: {
latest: {
num_versions: 1
}
},
max_batch_size: 500000,
dynamic_batching {
preferred_batch_size: [ 500000 ]
max_queue_delay_microseconds: 100
}
input: [
{
name: “token_type_ids”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
},
{
name: “attention_mask”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
},
{
name: “input_ids”,
data_type: TYPE_INT64,
dims: [
-1
],
is_shape_tensor: false,
allow_ragged_batch: false
}
],
output: [
{
name: “output_0”,
data_type: TYPE_FP32,
dims: [86],
label_filename: “”,
is_shape_tensor: false
}
],
batch_input: ,
batch_output: ,
optimization: {
priority: PRIORITY_DEFAULT,
input_pinned_memory: {
enable: true
},
output_pinned_memory: {
enable: true
},
gather_kernel_buffer_threshold: 0,
eager_batching: false
},
instance_group: [
{
name: “textmodel.onnx”,
kind: KIND_CPU,
count: 1,
gpus: ,
secondary_devices: ,
profile: ,
passive: false,
host_policy: “”
}
]

i have 500000 rows which i split down to 10k rows for inference,i am using grpcc client…

response = triton_client.async_infer(‘textmodel.onnx’,inputs,callback=partial(callback, user_data),outputs=outputs,client_timeout=120)

once all the batch is completed, trition inference server running on my cpu system exit and stop with error - (base) C:\Users\khana\Desktop\Work>python trition_grpc.py
[StatusCode.INTERNAL] onnxruntime execute failure 6: Non-zero status code returned while running DequantizeLinear node. Name:‘225_DequantizeLinear’ Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:330 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 25589760000

how to avoid it…is there any modifications i have to do…

if i send 500000 rows in one go then what modifications i have to do,so that trition inference server not be stopped and exited

nadeemm · July 27, 2021, 12:58am

Could you provide some information about your system configuration ?

amirkhan · July 27, 2021, 4:10am

@nadeemm my system config is,
windows 10
ram - 16 gb
i-7 10th gen 8 core
1650 ti gpu, but i am not using it, i am only using KIND_CPU instance group

is the config.pbtxt content above is correct based on system,if no plz share suggestions/instructions to correct it and use it in local prod, i have two onnx models both the config is same.

amirkhan · July 30, 2021, 6:19am

server.log (152.9 KB)
log file

nadeemm · August 26, 2021, 12:04am

Thanks Amir,
I shall try to find someone to help with answering questions about Triton - thanks for your patience.

nadeemm · September 30, 2021, 10:30pm

The process to get help with Triton is now exclusively via GitHub. If you would still like a response, please consider re-posting your question on: Triton Inference Server · GitHub , the NVIDIA and other teams will be able to help you there.
Sorry for the inconvenience and thanks for your patience.

Topic		Replies	Views
Converted model is broken if half precision with dynamic batch size and batch size is greater than 1 TensorRT	11	2352	October 18, 2024
Triton Inference Server Inference Request Error on GPU Triton Inference Server - archived jetson-inference , python , nano , gpu , segmentation	1	1809	September 29, 2021
Test triton with jmeter, much less throughoutput than perf-analyzer TensorRT inference-server-triton	1	464	November 15, 2023
TRITON's config.pbtxt only accepts 3dim input layers? Triton Inference Server - archived tensorrt , pytorch	4	1665	October 12, 2021
Model tensor shape configuration hints for dynamic batching but the underlying engine doesn't support batching Triton Inference Server - archived	4	2387	October 12, 2021
Looking for real fix for invalid resource handle error TensorRT jetson-inference , onnx	7	1494	July 28, 2021
Latency linearly increases when increased batch size or concurrent models TensorRT inference-server-triton	15	2033	September 29, 2021
Build TRT engine with onnx QAT model throws segmentation fault TensorRT	3	1273	August 12, 2021
P2PNet converted to onnx return bad output when used on triton server TensorRT inference-server-triton	2	459	September 12, 2023
Cannot serialize ONNX model on TensorRT 8 TensorRT	3	1446	May 26, 2021

Trition exiting after every batch inference on cpu

Related topics