Triton inference server is sending back "HTTP/1.1 400 Bad Request"

Hi

I am setting up the inference server as per instructions on the guide:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/quickstart.html

My setup is as:

GPU: RTX 2080 Ti
Cuda Version: 10.2
Driver: 450.57
cudnn: libcudnn8_8.0.2.39-1+cuda10.2_amd64

Apparently inference server starts and listens as describe in the guide however when I run the command using curl to test then don’t receive 200 from server and receive following message:

Command: curl -v localhost:8000/v2/health/ready

  • Trying 127.0.0.1…
  • TCP_NODELAY set
  • Connected to localhost (127.0.0.1) port 8000 (#0)

GET /v2/health/ready HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.58.0
Accept: /

< HTTP/1.1 400 Bad Request
< Content-Length: 0
< Content-Type: text/plain
<

  • Connection #0 to host localhost left intact

Please help me with this issue. Thanks.

Regards,
Ghazni

=============================
== Execution Command to start inference server

docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/mgsaeed/wd500gb/github/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.07-v1-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==

NVIDIA Release 20.07 (build 14602913)

Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2020-08-10 16:11:10.765523: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
I0810 16:11:10.792665 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0810 16:11:10.798194 1 metrics.cc:173] GPU 0: GeForce RTX 2080 Ti
I0810 16:11:10.798388 1 server.cc:127] Initializing Triton Inference Server
I0810 16:11:10.955257 1 server_status.cc:55] New status tracking for model ‘densenet_onnx’
I0810 16:11:10.955277 1 server_status.cc:55] New status tracking for model ‘inception_graphdef’
I0810 16:11:10.955281 1 server_status.cc:55] New status tracking for model ‘resnet50_netdef’
I0810 16:11:10.955285 1 server_status.cc:55] New status tracking for model ‘simple’
I0810 16:11:10.955288 1 server_status.cc:55] New status tracking for model ‘simple_string’
I0810 16:11:10.955312 1 model_repository_manager.cc:723] loading: simple:1
I0810 16:11:10.955387 1 model_repository_manager.cc:723] loading: simple_string:1
I0810 16:11:10.955491 1 model_repository_manager.cc:723] loading: resnet50_netdef:1
I0810 16:11:10.955541 1 model_repository_manager.cc:723] loading: inception_graphdef:1
I0810 16:11:10.955609 1 model_repository_manager.cc:723] loading: densenet_onnx:1
I0810 16:11:10.957881 1 base_backend.cc:176] Creating instance simple_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.957958 1 base_backend.cc:176] Creating instance simple_string_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.958099 1 base_backend.cc:176] Creating instance inception_graphdef_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.984917 1 onnx_backend.cc:203] Creating instance densenet_onnx_0_gpu0 on GPU 0 (7.5) using model.onnx
2020-08-10 16:11:10.988128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-08-10 16:11:10.989941: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34088380 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:10.989983: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-10 16:11:10.990153: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-10 16:11:10.991876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:10.991917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:10.991964: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:10.991997: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:10.992066: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:10.998282: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:10.998368: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:10.998399: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:11.003025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
I0810 16:11:11.064634 1 netdef_backend.cc:206] Creating instance resnet50_netdef_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
2020-08-10 16:11:12.082177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.082227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.082238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.088323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.092148: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34766940 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:12.092182: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-08-10 16:11:12.094081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.094142: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.094158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.094173: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.094187: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.094215: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.094249: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.094261: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.097381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.097420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.097431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.097440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.100445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.102552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.102626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.102656: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.102684: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.102705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.102749: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.102770: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.102789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.106010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.106055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.106072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.106091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.109675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
I0810 16:11:12.110803 1 model_repository_manager.cc:888] successfully loaded ‘simple’ version 1
I0810 16:11:12.110835 1 model_repository_manager.cc:888] successfully loaded ‘simple_string’ version 1
I0810 16:11:12.203993 1 model_repository_manager.cc:888] successfully loaded ‘inception_graphdef’ version 1
I0810 16:11:12.813593 1 model_repository_manager.cc:888] successfully loaded ‘densenet_onnx’ version 1
I0810 16:11:12.892981 1 model_repository_manager.cc:888] successfully loaded ‘resnet50_netdef’ version 1
Starting endpoints, ‘inference:0’ listening on
I0810 16:11:12.895164 1 grpc_server.cc:1942] Started GRPCService at 0.0.0.0:8001
I0810 16:11:12.895179 1 http_server.cc:1428] Starting HTTPService at 0.0.0.0:8000
I0810 16:11:12.936628 1 http_server.cc:1443] Starting Metrics Service at 0.0.0.0:8002

Please try
$ curl localhost:8000/api/status

Thanks. I have tried which seems to be working though the response message is not same. There is no “HTTP/1.1 200 OK” however print the status of the various models.

After printing a lot of text it does print out SERVER_READY. Is it the correct/expected response? Thanks.

Regards,
ghazni

curl localhost:8000/api/status
id: “inference:0”
version: “1.15.0”
uptime_ns: 62975350798
model_status {
key: “densenet_onnx”
value {
config {
name: “densenet_onnx”
platform: “onnxruntime_onnx”
version_policy {
latest {
num_versions: 1
}
}
input {
name: “data_0”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
reshape {
shape: 1
shape: 3
shape: 224
shape: 224
}
}
output {
name: “fc6_1”
data_type: TYPE_FP32
dims: 1000
label_filename: “densenet_labels.txt”
reshape {
shape: 1
shape: 1000
shape: 1
shape: 1
}
}
instance_group {
name: “densenet_onnx”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.onnx”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “inception_graphdef”
value {
config {
name: “inception_graphdef”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “input”
data_type: TYPE_FP32
format: FORMAT_NHWC
dims: 299
dims: 299
dims: 3
}
output {
name: “InceptionV3/Predictions/Softmax”
data_type: TYPE_FP32
dims: 1001
label_filename: “inception_labels.txt”
}
instance_group {
name: “inception_graphdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “resnet50_netdef”
value {
config {
name: “resnet50_netdef”
platform: “caffe2_netdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “gpu_0/data”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
}
output {
name: “gpu_0/softmax”
data_type: TYPE_FP32
dims: 1000
label_filename: “resnet50_labels.txt”
}
instance_group {
name: “resnet50_netdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.netdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple”
value {
config {
name: “simple”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_INT32
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_INT32
dims: 16
}
instance_group {
name: “simple”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple_string”
value {
config {
name: “simple_string”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_STRING
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_STRING
dims: 16
}
instance_group {
name: “simple_string”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
ready_state: SERVER_READY

OK I further went ahead and pulled/run clientsdk example as (edited this post as well after finding new information).

docker pull nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk

after this from inside the container
/workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

However got below errors:
error: failed to get model metadata: failed to parse the request JSON buffer: The document is empty. at 0 …

Ok this issue was resolved by pulling the previous image associated with tensorrtserver as
docker pull nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk

So In summary:
1: curl -v localhost:8000/v2/health/ready (not working; getting 400)
2: curl localhost:8000/api/status (apparently working - details in my previous message)
3: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (not working with tritonserver:20.07-py3-clientsdk)
4: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (working with nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk)

Are those expected results? Many thanks.

Regards,
Ghazni

2 Likes

For server status.,if it is ready, it should be fine.
For item 3 or 4,actually you have narrowed down. So please check the difference of the sdks.

Many thanks, Morganh.

Regards,
Ghazni