Triton inference server is sending back "HTTP/1.1 400 Bad Request"

Hi

I am setting up the inference server as per instructions on the guide:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/quickstart.html

My setup is as:

GPU: RTX 2080 Ti
Cuda Version: 10.2
Driver: 450.57
cudnn: libcudnn8_8.0.2.39-1+cuda10.2_amd64

Apparently inference server starts and listens as describe in the guide however when I run the command using curl to test then don’t receive 200 from server and receive following message:

Command: curl -v localhost:8000/v2/health/ready

  • Trying 127.0.0.1…
  • TCP_NODELAY set
  • Connected to localhost (127.0.0.1) port 8000 (#0)

GET /v2/health/ready HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.58.0
Accept: /

< HTTP/1.1 400 Bad Request
< Content-Length: 0
< Content-Type: text/plain
<

  • Connection #0 to host localhost left intact

Please help me with this issue. Thanks.

Regards,
Ghazni

=============================
== Execution Command to start inference server

docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/mgsaeed/wd500gb/github/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.07-v1-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==

NVIDIA Release 20.07 (build 14602913)

Copyright © 2018-2020, NVIDIA CORPORATION. All rights reserved.

Various files include modifications © NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2020-08-10 16:11:10.765523: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
I0810 16:11:10.792665 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0810 16:11:10.798194 1 metrics.cc:173] GPU 0: GeForce RTX 2080 Ti
I0810 16:11:10.798388 1 server.cc:127] Initializing Triton Inference Server
I0810 16:11:10.955257 1 server_status.cc:55] New status tracking for model ‘densenet_onnx’
I0810 16:11:10.955277 1 server_status.cc:55] New status tracking for model ‘inception_graphdef’
I0810 16:11:10.955281 1 server_status.cc:55] New status tracking for model ‘resnet50_netdef’
I0810 16:11:10.955285 1 server_status.cc:55] New status tracking for model ‘simple’
I0810 16:11:10.955288 1 server_status.cc:55] New status tracking for model ‘simple_string’
I0810 16:11:10.955312 1 model_repository_manager.cc:723] loading: simple:1
I0810 16:11:10.955387 1 model_repository_manager.cc:723] loading: simple_string:1
I0810 16:11:10.955491 1 model_repository_manager.cc:723] loading: resnet50_netdef:1
I0810 16:11:10.955541 1 model_repository_manager.cc:723] loading: inception_graphdef:1
I0810 16:11:10.955609 1 model_repository_manager.cc:723] loading: densenet_onnx:1
I0810 16:11:10.957881 1 base_backend.cc:176] Creating instance simple_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.957958 1 base_backend.cc:176] Creating instance simple_string_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.958099 1 base_backend.cc:176] Creating instance inception_graphdef_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.984917 1 onnx_backend.cc:203] Creating instance densenet_onnx_0_gpu0 on GPU 0 (7.5) using model.onnx
2020-08-10 16:11:10.988128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-08-10 16:11:10.989941: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34088380 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:10.989983: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-10 16:11:10.990153: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-10 16:11:10.991876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:10.991917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:10.991964: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:10.991997: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:10.992066: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:10.998282: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:10.998368: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:10.998399: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:11.003025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
I0810 16:11:11.064634 1 netdef_backend.cc:206] Creating instance resnet50_netdef_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
2020-08-10 16:11:12.082177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.082227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.082238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.088323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.092148: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34766940 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:12.092182: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-08-10 16:11:12.094081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.094142: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.094158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.094173: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.094187: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.094215: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.094249: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.094261: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.097381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.097420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.097431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.097440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.100445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.102552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.102626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.102656: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.102684: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.102705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.102749: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.102770: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.102789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.106010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.106055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.106072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.106091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.109675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
I0810 16:11:12.110803 1 model_repository_manager.cc:888] successfully loaded ‘simple’ version 1
I0810 16:11:12.110835 1 model_repository_manager.cc:888] successfully loaded ‘simple_string’ version 1
I0810 16:11:12.203993 1 model_repository_manager.cc:888] successfully loaded ‘inception_graphdef’ version 1
I0810 16:11:12.813593 1 model_repository_manager.cc:888] successfully loaded ‘densenet_onnx’ version 1
I0810 16:11:12.892981 1 model_repository_manager.cc:888] successfully loaded ‘resnet50_netdef’ version 1
Starting endpoints, ‘inference:0’ listening on
I0810 16:11:12.895164 1 grpc_server.cc:1942] Started GRPCService at 0.0.0.0:8001
I0810 16:11:12.895179 1 http_server.cc:1428] Starting HTTPService at 0.0.0.0:8000
I0810 16:11:12.936628 1 http_server.cc:1443] Starting Metrics Service at 0.0.0.0:8002

Please try
$ curl localhost:8000/api/status

Thanks. I have tried which seems to be working though the response message is not same. There is no “HTTP/1.1 200 OK” however print the status of the various models.

After printing a lot of text it does print out SERVER_READY. Is it the correct/expected response? Thanks.

Regards,
ghazni

curl localhost:8000/api/status
id: “inference:0”
version: “1.15.0”
uptime_ns: 62975350798
model_status {
key: “densenet_onnx”
value {
config {
name: “densenet_onnx”
platform: “onnxruntime_onnx”
version_policy {
latest {
num_versions: 1
}
}
input {
name: “data_0”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
reshape {
shape: 1
shape: 3
shape: 224
shape: 224
}
}
output {
name: “fc6_1”
data_type: TYPE_FP32
dims: 1000
label_filename: “densenet_labels.txt”
reshape {
shape: 1
shape: 1000
shape: 1
shape: 1
}
}
instance_group {
name: “densenet_onnx”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.onnx”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “inception_graphdef”
value {
config {
name: “inception_graphdef”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “input”
data_type: TYPE_FP32
format: FORMAT_NHWC
dims: 299
dims: 299
dims: 3
}
output {
name: “InceptionV3/Predictions/Softmax”
data_type: TYPE_FP32
dims: 1001
label_filename: “inception_labels.txt”
}
instance_group {
name: “inception_graphdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “resnet50_netdef”
value {
config {
name: “resnet50_netdef”
platform: “caffe2_netdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “gpu_0/data”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
}
output {
name: “gpu_0/softmax”
data_type: TYPE_FP32
dims: 1000
label_filename: “resnet50_labels.txt”
}
instance_group {
name: “resnet50_netdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.netdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple”
value {
config {
name: “simple”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_INT32
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_INT32
dims: 16
}
instance_group {
name: “simple”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple_string”
value {
config {
name: “simple_string”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_STRING
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_STRING
dims: 16
}
instance_group {
name: “simple_string”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
ready_state: SERVER_READY

OK I further went ahead and pulled/run clientsdk example as (edited this post as well after finding new information).

docker pull nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk

after this from inside the container
/workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

However got below errors:
error: failed to get model metadata: failed to parse the request JSON buffer: The document is empty. at 0 …

Ok this issue was resolved by pulling the previous image associated with tensorrtserver as
docker pull nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk

So In summary:
1: curl -v localhost:8000/v2/health/ready (not working; getting 400)
2: curl localhost:8000/api/status (apparently working - details in my previous message)
3: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (not working with tritonserver:20.07-py3-clientsdk)
4: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (working with nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk)

Are those expected results? Many thanks.

Regards,
Ghazni

For server status.,if it is ready, it should be fine.
For item 3 or 4,actually you have narrowed down. So please check the difference of the sdks.

Many thanks, Morganh.

Regards,
Ghazni