Description
I converted custom model for multimodal service using tensorrt-llm. and I convert this model(vision model only converted) using two types.
- trtexec (from TensorRT 10.0.0)
- It can be used async_v3 functions
- It can works fine.
- builder.build_serialized_network(from TensorRT-LLM 0.10.0)
- It can be converted but not worked.
- If I can use like this sample code(TensorRT-LLM/examples/multimodal/run.py at v0.10.0 · NVIDIA/TensorRT-LLM · GitHub), it make some problem with this message.
- [06/18/2024-05:19:51] [TRT] [E] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:847] CUDA error 400 launching __myl_MovRepCon kernel.)
So, my questions is I don’t know what’s difference with trtexec and builder.build_serialized_network functions. It can be make almost same outputs but I don’t know why it can not use same functions. Please refer me I want to understand why it works different.
and If I want to get the perfectly same output using TensorRT, but have small amount of performance advantage, can it be possible?
Thanks.
Environment
TensorRT Version: 10.1.0
TensorRT-LLM Version: 0.10.0 (Stable)
GPU Type: L40s
Nvidia Driver Version: 550.90.07
CUDA Version:12.4
Operating System + Version: Ubuntu
Python Version (if applicable): 3.10.12
Baremetal or Container (if container which image + tag): nvidia/cuda:12.4.0-devel-ubuntu22.04
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered