Triton server 20.02/20.03 GPU memory leaks [bug https://developer.nvidia.com/nvidia_bug/3061266]

Hi, I trying to use cuda shared memory to communicate with TRITON
My code is based on
https://github.com/NVIDIA/triton-inference-server/blob/r20.02/src/clients/c%2B%2B/examples/simple_cuda_shm_client.cc,
but always get GPU memory leaks from 2mb each run to 1GB for complex models.

Model is fixed size
To reproduce memory leaks I used batch 1.
( For other more complex models with dynamic dims/ more batches - leaks more)

triton server log:

model config:
id: “inference:0”
version: “1.11.0”
uptime_ns: 560023510734
model_status {
key: “Deeplab”
value {
config {
name: “Deeplab”
platform: “tensorrt_plan”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 4
input {
name: “input_1_9:0”
data_type: TYPE_FP32
dims: 512
dims: 512
dims: 3
}
output {
name: “bilinear_upsampling_2_5/ResizeBilinear:0”
data_type: TYPE_FP32
dims: 512
dims: 512
dims: 2
}
instance_group {
name: “Deeplab_0”
count: 1
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.plan”
dynamic_batching {
preferred_batch_size: 2
preferred_batch_size: 4
max_queue_delay_microseconds: 2000
}
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
infer_stats {
key: 1
value {
success {
count: 40
total_time_ns: 1224701969
}
compute {
count: 40
total_time_ns: 1215963464
}
queue {
count: 40
total_time_ns: 4083860
}
}
}
model_execution_count: 20
model_inference_count: 40
ready_state_reason {
}
last_inference_timestamp_milliseconds: 13928679385895380608
}
}
}
}
ready_state: SERVER_READY