Help: I am getting a weird error when trying to deploy Triton sever using a custom tlt faster rcnn model

Good afternoon,

This past month I have been using Triton and absolutely love it. On my local machine, everything works as expected. However, when I transferred the scripts and containers to our deployment machine, I keep getting hit with this error as soon as the client connects.

E0517 13:08:01.254093 1 logging.cc:40] Assertion failed: status == 0
/home/jenkins/workspace/OSS/L0_MergeRequest/oss/plugin/common/kernels/proposalKernel.cu:703
Aborting...

Before this error, the server is functional and loads every model properly. It waits there until being called by the client with no issue at all. However, as soon as the client calls the server over grpc this error occurs.

Please advise on how to fix this issue as I have been trying to debug it for almost 2 weeks now with no luck.

The server is using 4 p100 gpus.

some more context…

the client is running in its own container while the server is running in the tlt_quick_start container.

Each container is bridged over the host network.

The client is able to make calls to the server such as get model configuration and get model list. However, the server breaks as soon as the client calls the server for inference. Is this a driver issue???

The triton server container is 20.11 and the server is running version 2.5

Solved… had to use the Triton Server 20.10 container