Triton inference server is sending back "HTTP/1.1 400 Bad Request"

ghazni · August 10, 2020, 4:27pm

Hi

I am setting up the inference server as per instructions on the guide:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/quickstart.html

My setup is as:

GPU: RTX 2080 Ti
Cuda Version: 10.2
Driver: 450.57
cudnn: libcudnn8_8.0.2.39-1+cuda10.2_amd64

Apparently inference server starts and listens as describe in the guide however when I run the command using curl to test then don’t receive 200 from server and receive following message:

Command: curl -v localhost:8000/v2/health/ready

Trying 127.0.0.1…
TCP_NODELAY set
Connected to localhost (127.0.0.1) port 8000 (#0)

GET /v2/health/ready HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.58.0
Accept: /

< HTTP/1.1 400 Bad Request
< Content-Length: 0
< Content-Type: text/plain
<

Connection #0 to host localhost left intact

Please help me with this issue. Thanks.

Regards,
Ghazni

=============================
== Execution Command to start inference server

docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/mgsaeed/wd500gb/github/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.07-v1-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==

NVIDIA Release 20.07 (build 14602913)

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2020-08-10 16:11:10.765523: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
I0810 16:11:10.792665 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0810 16:11:10.798194 1 metrics.cc:173] GPU 0: GeForce RTX 2080 Ti
I0810 16:11:10.798388 1 server.cc:127] Initializing Triton Inference Server
I0810 16:11:10.955257 1 server_status.cc:55] New status tracking for model ‘densenet_onnx’
I0810 16:11:10.955277 1 server_status.cc:55] New status tracking for model ‘inception_graphdef’
I0810 16:11:10.955281 1 server_status.cc:55] New status tracking for model ‘resnet50_netdef’
I0810 16:11:10.955285 1 server_status.cc:55] New status tracking for model ‘simple’
I0810 16:11:10.955288 1 server_status.cc:55] New status tracking for model ‘simple_string’
I0810 16:11:10.955312 1 model_repository_manager.cc:723] loading: simple:1
I0810 16:11:10.955387 1 model_repository_manager.cc:723] loading: simple_string:1
I0810 16:11:10.955491 1 model_repository_manager.cc:723] loading: resnet50_netdef:1
I0810 16:11:10.955541 1 model_repository_manager.cc:723] loading: inception_graphdef:1
I0810 16:11:10.955609 1 model_repository_manager.cc:723] loading: densenet_onnx:1
I0810 16:11:10.957881 1 base_backend.cc:176] Creating instance simple_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.957958 1 base_backend.cc:176] Creating instance simple_string_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.958099 1 base_backend.cc:176] Creating instance inception_graphdef_0_gpu0 on GPU 0 (7.5) using model.graphdef
I0810 16:11:10.984917 1 onnx_backend.cc:203] Creating instance densenet_onnx_0_gpu0 on GPU 0 (7.5) using model.onnx
2020-08-10 16:11:10.988128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-08-10 16:11:10.989941: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34088380 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:10.989983: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-10 16:11:10.990153: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-10 16:11:10.991876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:10.991917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:10.991964: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:10.991997: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:10.992066: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:10.998282: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:10.998368: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:10.998399: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:11.003025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
I0810 16:11:11.064634 1 netdef_backend.cc:206] Creating instance resnet50_netdef_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
2020-08-10 16:11:12.082177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.082227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.082238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.088323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.092148: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f2a34766940 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:11:12.092182: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-08-10 16:11:12.094081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.094142: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.094158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.094173: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.094187: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.094215: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.094249: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.094261: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.097381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.097420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.097431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.097440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.100445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-08-10 16:11:12.102552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:67:00.0
2020-08-10 16:11:12.102626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-08-10 16:11:12.102656: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-08-10 16:11:12.102684: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-10 16:11:12.102705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-10 16:11:12.102749: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-10 16:11:12.102770: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-08-10 16:11:12.102789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-10 16:11:12.106010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-10 16:11:12.106055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-10 16:11:12.106072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-10 16:11:12.106091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-10 16:11:12.109675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9610 MB memory) → physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5)
I0810 16:11:12.110803 1 model_repository_manager.cc:888] successfully loaded ‘simple’ version 1
I0810 16:11:12.110835 1 model_repository_manager.cc:888] successfully loaded ‘simple_string’ version 1
I0810 16:11:12.203993 1 model_repository_manager.cc:888] successfully loaded ‘inception_graphdef’ version 1
I0810 16:11:12.813593 1 model_repository_manager.cc:888] successfully loaded ‘densenet_onnx’ version 1
I0810 16:11:12.892981 1 model_repository_manager.cc:888] successfully loaded ‘resnet50_netdef’ version 1
Starting endpoints, ‘inference:0’ listening on
I0810 16:11:12.895164 1 grpc_server.cc:1942] Started GRPCService at 0.0.0.0:8001
I0810 16:11:12.895179 1 http_server.cc:1428] Starting HTTPService at 0.0.0.0:8000
I0810 16:11:12.936628 1 http_server.cc:1443] Starting Metrics Service at 0.0.0.0:8002

Morganh · August 11, 2020, 3:48am

Please try
$ curl localhost:8000/api/status

ghazni · August 11, 2020, 6:43am

Thanks. I have tried which seems to be working though the response message is not same. There is no “HTTP/1.1 200 OK” however print the status of the various models.

After printing a lot of text it does print out SERVER_READY. Is it the correct/expected response? Thanks.

Regards,
ghazni

curl localhost:8000/api/status
id: “inference:0”
version: “1.15.0”
uptime_ns: 62975350798
model_status {
key: “densenet_onnx”
value {
config {
name: “densenet_onnx”
platform: “onnxruntime_onnx”
version_policy {
latest {
num_versions: 1
}
}
input {
name: “data_0”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
reshape {
shape: 1
shape: 3
shape: 224
shape: 224
}
}
output {
name: “fc6_1”
data_type: TYPE_FP32
dims: 1000
label_filename: “densenet_labels.txt”
reshape {
shape: 1
shape: 1000
shape: 1
shape: 1
}
}
instance_group {
name: “densenet_onnx”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.onnx”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “inception_graphdef”
value {
config {
name: “inception_graphdef”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “input”
data_type: TYPE_FP32
format: FORMAT_NHWC
dims: 299
dims: 299
dims: 3
}
output {
name: “InceptionV3/Predictions/Softmax”
data_type: TYPE_FP32
dims: 1001
label_filename: “inception_labels.txt”
}
instance_group {
name: “inception_graphdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “resnet50_netdef”
value {
config {
name: “resnet50_netdef”
platform: “caffe2_netdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: “gpu_0/data”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: 3
dims: 224
dims: 224
}
output {
name: “gpu_0/softmax”
data_type: TYPE_FP32
dims: 1000
label_filename: “resnet50_labels.txt”
}
instance_group {
name: “resnet50_netdef”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.netdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple”
value {
config {
name: “simple”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_INT32
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_INT32
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_INT32
dims: 16
}
instance_group {
name: “simple”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
model_status {
key: “simple_string”
value {
config {
name: “simple_string”
platform: “tensorflow_graphdef”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 8
input {
name: “INPUT0”
data_type: TYPE_STRING
dims: 16
}
input {
name: “INPUT1”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT0”
data_type: TYPE_STRING
dims: 16
}
output {
name: “OUTPUT1”
data_type: TYPE_STRING
dims: 16
}
instance_group {
name: “simple_string”
count: 1
gpus: 0
gpus: 1
kind: KIND_GPU
}
default_model_filename: “model.graphdef”
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
ready_state: SERVER_READY

ghazni · August 11, 2020, 7:39am

OK I further went ahead and pulled/run clientsdk example as (edited this post as well after finding new information).

docker pull nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.07-py3-clientsdk

after this from inside the container
/workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

However got below errors:
error: failed to get model metadata: failed to parse the request JSON buffer: The document is empty. at 0 …

Ok this issue was resolved by pulling the previous image associated with tensorrtserver as
docker pull nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk

So In summary:
1: curl -v localhost:8000/v2/health/ready (not working; getting 400)
2: curl localhost:8000/api/status (apparently working - details in my previous message)
3: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (not working with tritonserver:20.07-py3-clientsdk)
4: /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg (working with nvcr.io/nvidia/tensorrtserver:20.02-py3-clientsdk)

Are those expected results? Many thanks.

Regards,
Ghazni

Morganh · August 11, 2020, 11:26am

For server status.,if it is ready, it should be fine.
For item 3 or 4,actually you have narrowed down. So please check the difference of the sdks.

ghazni · August 11, 2020, 11:50am

Many thanks, Morganh.

Regards,
Ghazni

Topic		Replies	Views
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1196	October 12, 2023
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT	6	987	July 15, 2021
Trying to run TensorFlow 1.15 and 2.4.1 produced graphdefs with TF2 based tensorRT but Triton Server inference not working correctly Triton Inference Server - archived tensorrt , tensorflow , python , inference-server-triton , machine-learning	0	941	May 24, 2021
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2717	September 2, 2022
Deepstream with triton is stuck and not outputting anything DeepStream SDK inference-server-triton , inception	5	999	September 19, 2022
Nvinferserver apps crashing just by importing torch DeepStream SDK inference-server-triton	8	646	February 22, 2024
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	571	July 30, 2024
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1073	October 19, 2023
Triton refrence graph throws an error "Could not get static pad 'sink'" DeepStream SDK inference-server-triton , graph-composer	10	1122	December 13, 2021
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server Technical Blog	13	1177	May 25, 2022

Triton inference server is sending back "HTTP/1.1 400 Bad Request"

============================= == Execution Command to start inference server

============================= == Triton Inference Server ==

Related topics

=============================
== Execution Command to start inference server

=============================
== Triton Inference Server ==