Triton Inference Server Image: nvcr.io/nvidia/tritonserver:20.03.1-py3
OS: Ubuntu 18.04
CUDA version: 10.1, cudnn 7.6.5
Hi, I am using Triton inference server for doing dynamic batching based inference.
Following is my config.pbtxt file -
preferred_batch_size: [8, 64]
I am successfully able to run the inferrence using a single instance of my effdet model.
However changing number of model instances to anything other than 1, i.e
count: 2 #[or 3,4 etc]
leads to an error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered.
Refering to the screenshot, can someone please assist me with a solution.