Description
I have a Speech-To-Text PyTorch model that I would like to convert to TensorRT.
The model structure is shown below in the “Relevant Files” section.
It has two inputs:
- tokens: torch.Size([1, 1]) (Data Type: integer)
- audio_features: torch.Size([1, 1, 512]) (Data Type: FP16 )
and one output:
out: torch.Size([1, 1, 51865]) (Data Type: FP16)
If I generate inputs with the above size and run the model, I get the expected output.
How can I set up the configuration so that I can convert it to a TensorRT model object that can be deployed using Triton?
A clear and concise description of the bug or issue.
I used the following code:
x = torch.Tensor([1,1]).to("cuda")
y = torch.Tensor([1,1,512]).to("cuda")
inputs= [
torch_tensorrt.Input(x.shape),
torch_tensorrt.Input(y.shape)
]
# Compile with Torch TensorRT;
trt_model = torch_tensorrt.compile(model,
inputs=inputs,
enabled_precisions= { torch.half } # Run with FP16
)
Here is the error I get ( The file a.py includes the above lines):
Traceback (most recent call last):
File "a.py", line 47, in <module>
trt_model = torch_tensorrt.compile(model,
File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 124, in compile
ts_mod = torch.jit.script(module)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_script.py", line 1286, in script
return torch.jit._recursive.create_script_module(
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 476, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_script.py", line 615, in _construct
init_fn(script_module)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 516, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_script.py", line 615, in _construct
init_fn(script_module)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 516, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 894, in compile_unbound_method
create_methods_and_properties_from_stubs(concrete_type, (stub,), ())
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError:
File "/usr/local/lib/python3.8/dist-packages/STT/model.py", line 43
def _conv_forward(self, x: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor:
return super()._conv_forward(
~~~~~~~~~~~~~~~~~~~ <--- HERE
x, weight.to(x.dtype), None if bias is None else bias.to(x.dtype)
)
'Conv1d._conv_forward' is being compiled since it was called from 'Conv1d.forward'
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 313
def forward(self, input: Tensor) -> Tensor:
return self._conv_forward(input, self.weight, self.bias)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Environment
TensorRT Version:
GPU Type:
- Tesla M60
Nvidia Driver Version:
- 525
CUDA Version:
- 11.8
CUDNN Version:
- 8.7.0.84
Operating System + Version:
- Ubuntu 20.04
Python Version (if applicable):
3.8
TensorFlow Version (if applicable):
- N/A
PyTorch Version (if applicable):
- 1.13.0.post200
Baremetal or Container (if container which image + tag):
- NA
Relevant Files
Here is the Speech-To_Text (STT) model structure:
STT(
(encoder): AudioEncoder(
(conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
(conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
(blocks): ModuleList(
(0): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(1): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(2): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(3): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(4): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(5): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln_post): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(decoder): TextDecoder(
(token_embedding): Embedding(51865, 512)
(blocks): ModuleList(
(0): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(1): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(2): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(3): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(4): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(5): ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)