Description
I am trying to use “Model Navigator” for Triton but when I use “optimize” I get the following error (after the conversions are finished).
root@:/home/ubuntu/model_navigator# model-navigator optimize bert.nav
2023-01-27 07:05:03 - INFO - model_navigator.log: optimize args:
2023-01-27 07:05:03 - INFO - model_navigator.log: model_name = my_model
2023-01-27 07:05:03 - INFO - model_navigator.log: model_path = /home/ubuntu/model_navigator/navigator_workspace/.input_data/input_model/torchscript-trace/model.pt
2023-01-27 07:05:03 - INFO - model_navigator.log: model_format = torchscript
2023-01-27 07:05:03 - INFO - model_navigator.log: model_version = 1
2023-01-27 07:05:03 - INFO - model_navigator.log: target_formats = ['tf-trt', 'tf-savedmodel', 'onnx', 'trt', 'torchscript', 'torch-trt']
2023-01-27 07:05:03 - INFO - model_navigator.log: onnx_opsets = [14]
2023-01-27 07:05:03 - INFO - model_navigator.log: tensorrt_precisions = ['fp32', 'fp16']
2023-01-27 07:05:03 - INFO - model_navigator.log: tensorrt_precisions_mode = hierarchy
2023-01-27 07:05:03 - INFO - model_navigator.log: tensorrt_explicit_precision = False
2023-01-27 07:05:03 - INFO - model_navigator.log: tensorrt_sparse_weights = False
2023-01-27 07:05:03 - INFO - model_navigator.log: tensorrt_max_workspace_size = 4294967296
2023-01-27 07:05:03 - INFO - model_navigator.log: atol = {'output__0': 0.23096442222595215}
2023-01-27 07:05:03 - INFO - model_navigator.log: rtol = {'output__0': 0.09238576889038086}
2023-01-27 07:05:03 - INFO - model_navigator.log: inputs = {'input__0': {'name': 'input__0', 'shape': [-1, 8], 'dtype': 'int64', 'optional': False}, 'input__1': {'name': 'input__1', 'shape': [-1, 8], 'dtype': 'int64', 'optional': False}}
2023-01-27 07:05:03 - INFO - model_navigator.log: outputs = {'output__0': {'name': 'output__0', 'shape': [-1, 2], 'dtype': 'float32', 'optional': False}}
2023-01-27 07:05:03 - INFO - model_navigator.log: min_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log: opt_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log: max_shapes = None
2023-01-27 07:05:03 - INFO - model_navigator.log: value_ranges = None
2023-01-27 07:05:03 - INFO - model_navigator.log: dtypes = None
2023-01-27 07:05:03 - INFO - model_navigator.log: engine_count_per_device = {}
2023-01-27 07:05:03 - INFO - model_navigator.log: triton_backend_parameters = {}
2023-01-27 07:05:03 - INFO - model_navigator.log: triton_launch_mode = local
2023-01-27 07:05:03 - INFO - model_navigator.log: triton_server_path = tritonserver
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_max_batch_size = 128
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_max_concurrency = 1024
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_max_instance_count = 5
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_concurrency = []
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_instance_counts = {}
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_max_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_preferred_batch_sizes = []
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_backend_parameters = {}
2023-01-27 07:05:03 - INFO - model_navigator.log: config_search_early_exit_enable = False
2023-01-27 07:05:03 - INFO - model_navigator.log: top_n_configs = 3
2023-01-27 07:05:03 - INFO - model_navigator.log: objectives = {'perf_throughput': 10}
2023-01-27 07:05:03 - INFO - model_navigator.log: max_latency_ms = None
2023-01-27 07:05:03 - INFO - model_navigator.log: min_throughput = 0
2023-01-27 07:05:03 - INFO - model_navigator.log: max_gpu_usage_mb = None
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_analyzer_timeout = 600
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_analyzer_path = perf_analyzer
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_measurement_mode = count_windows
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_measurement_request_count = 50
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_measurement_interval = 5000
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_measurement_shared_memory = none
2023-01-27 07:05:03 - INFO - model_navigator.log: perf_measurement_output_shared_memory_size = 102400
2023-01-27 07:05:03 - INFO - model_navigator.log: workspace_path = navigator_workspace
2023-01-27 07:05:03 - INFO - model_navigator.log: override_workspace = False
2023-01-27 07:05:03 - INFO - model_navigator.log: override_conversion_container = False
2023-01-27 07:05:03 - INFO - model_navigator.log: framework_docker_image = nvcr.io/nvidia/pytorch:22.10-py3
2023-01-27 07:05:03 - INFO - model_navigator.log: triton_docker_image = nvcr.io/nvidia/tritonserver:22.10-py3
2023-01-27 07:05:03 - INFO - model_navigator.log: gpus = ('all',)
2023-01-27 07:05:03 - INFO - model_navigator.log: verbose = False
2023-01-27 07:05:03 - INFO - model_navigator.utils.docker: Run docker container with image model_navigator_converter:22.10-py3; using workdir: /home/ubuntu/model_navigator
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Running command copy on /home/ubuntu/model_navigator/navigator_workspace/.input_data/input_model/torchscript-trace/model.pt
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Running command annotation on /home/ubuntu/model_navigator/navigator_workspace/converted/model.pt
2023-01-27 07:05:06 - INFO - model_navigator.converter.transformers: Saving annotations to /home/ubuntu/model_navigator/navigator_workspace/converted/model.pt.yaml
2023-01-27 07:05:06 - INFO - pyt.transformers: ts2onnx command started.
2023-01-27 07:05:17 - INFO - pyt.transformers: ts2onnx command succeed.
2023-01-27 07:05:18 - INFO - polygraphy.transformers: Polygraphy onnx2trt started.
2023-01-27 07:05:18 - WARNING - polygraphy.transformers: This conversion should be done on target GPU platform
2023-01-27 07:06:57 - INFO - polygraphy.transformers: onnx2trt command succeed.
2023-01-27 07:06:57 - INFO - polygraphy.transformers: Polygraphy onnx2trt succeeded.
2023-01-27 07:06:57 - INFO - polygraphy.transformers: Polygraphy onnx2trt started.
2023-01-27 07:06:57 - WARNING - polygraphy.transformers: This conversion should be done on target GPU platform
2023-01-27 07:25:40 - INFO - polygraphy.transformers: onnx2trt command succeed.
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
[I] Loading inference results from /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.comparator_outputs.json
2023-01-27 07:25:40 - WARNING - polygraphy.transformers: Polygraphy onnx2trt conversion failed. Details can be found in logfile: /home/ubuntu/model_navigator/navigator_workspace/converted/model-ts2onnx_op14-polygraphyonnx2trt_fp16_mh.plan.log
2023-01-27 07:25:40 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command started.
2023-01-27 07:25:40 - WARNING - model_navigator.converter.torch_tensorrt: This conversion should be done on target GPU platform
2023-01-27 07:26:10 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command succeeded.
2023-01-27 07:26:10 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command started.
2023-01-27 07:26:10 - WARNING - model_navigator.converter.torch_tensorrt: This conversion should be done on target GPU platform
2023-01-27 07:27:19 - INFO - model_navigator.converter.torch_tensorrt: model_navigator.converter.torch_tensorrt command succeeded.
2023-01-27 07:27:27 - INFO - optimize: Running Triton Model Configurator for converted models
2023-01-27 07:27:27 - INFO - optimize: - my_model.ts2onnx_op14
2023-01-27 07:27:27 - INFO - optimize: - my_model.ts2onnx_op14-polygraphyonnx2trt_fp32_mh
2023-01-27 07:27:27 - INFO - optimize: - my_model
2023-01-27 07:27:27 - INFO - optimize: - my_model.torch_tensorrt_module_precisionTensorRTPrecision.FP32
2023-01-27 07:27:27 - INFO - optimize: - my_model.torch_tensorrt_module_precisionTensorRTPrecision.FP16
2023-01-27 07:27:27 - INFO - optimize: Running triton model configuration variants generation for my_model.ts2onnx_op14
2023-01-27 07:27:27 - INFO - optimize: Generated model variant my_model.ts2onnx_op14 for Triton evaluation.
Traceback (most recent call last):
File "/opt/conda/bin/model-navigator", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/main.py", line 53, in main
cli(max_content_width=160)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/optimize.py", line 235, in optimize_cmd
config_results = _configure_models_on_triton(
File "/opt/conda/lib/python3.8/site-packages/model_navigator/cli/optimize.py", line 445, in _configure_models_on_triton
triton_server.start()
File "/opt/conda/lib/python3.8/site-packages/model_navigator/triton/server/server_local.py", line 71, in start
tritonserver_cmd = sh.Command(tritonserver_cmd)
File "/opt/conda/lib/python3.8/site-packages/sh.py", line 1310, in __init__
raise CommandNotFound(path)
sh.CommandNotFound: tritonserver
Steps To Reproduce
-
Create docker container for model navigator.
I pulled and built the image below and tagged it as “model-navigator”2023-01-27 07:05:03 - INFO - model_navigator.log: framework_docker_image = nvcr.io/nvidia/pytorch:22.10-py3
I used this code to run the container
docker run -it --rm \ --ipc=host \ --gpus 1 \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /home/ubuntu/model_navigator:/home/ubuntu/model_navigator \ -w /home/ubuntu/model_navigator \ --net host \ --name model-navigator \ --triton_launch_mode docker \ model-navigator /bin/bash
The image below is from the model navigator GitHub. I didn’t understand which directory I’m supposed to specify for “model-catalog”, so I skipped that volume mount. (This may be the cause…?)
I also didn’t understand where and how I’m supposed to specify “triton_lauch_mode=docker”.
-
Use nav.pytorch.export API to export a BERT MODEL inside the container
I was able to successfully create a .nav file -
Run “optimize” code inside the container
model-navigator optimize bert.nav
Then I get the error I’ve mentioned above.
Assuming from the error log, am I supposed to tag the docker container created during the process as “tritonserver”…?