Error Code 1: Myelin (No results returned from cublas heuristic search) #2115


I am trying to convert a sequence to sequence model in ONNX format to TensorRT engine. The inputs are of dynamic shapes.
Then the error about cuBLAS heuristic search was thrown when I run the build_engine() function.


TensorRT Version:
GPU Type: NVIDIA Tesla V100
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.7.6
TensorFlow Version (if applicable): n/a
PyTorch Version (if applicable): 1.11.0
Baremetal or Container (if container which image + tag):

This is the ERROR trace

[07/01/2022-02:46:21] [TRT] [V] *************** Autotuning format combination: Float(E1,1024,1), Float(1024,1), Float(E0,E0,(# 1 (SHAPE attention_mask)),1), Int32(), Int32() -> Float(E1,1024,1), Float((* 64 (# 1 (SHAPE input_ids))),64,1), Float((* 16 E3),E3,E2,1) where E0=(* (# 1 (SHAPE attention_mask)) (# 1 (SHAPE attention_mask))) E1=(* 1024 (# 1 (SHAPE input_ids))) E2=(BROADCAST_SIZE (# 1 (SHAPE input_ids)) (# 1 (SHAPE attention_mask))) E3=(* E2 E2) ***************
[07/01/2022-02:46:21] [TRT] [V] --------------- Timing Runner: {ForeignNode[306 + (Unnamed Layer* 175) [Shuffle]...Add_165]} (Myelin)
[07/01/2022-02:46:21] [TRT] [W] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[07/01/2022-02:46:21] [TRT] [W]  (# 1 (SHAPE attention_mask))
[07/01/2022-02:46:21] [TRT] [W]  (# 1 (SHAPE input_ids))
[07/01/2022-02:46:25] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2676, GPU 7130 (MiB)
[07/01/2022-02:46:25] [TRT] [E] 1: [codeGenerator.cpp::compileGraph::476] Error Code 1: Myelin (No results returned from cublas heuristic search)
Traceback (most recent call last):
  File "", line 110, in <module>
  File "", line 95, in test_tensorrt_m2m
    convert_onnx2trt("m2m100_418M", "models")
  File "/home/vanguyen/package/translate/translate/quantization/tensorrt/", line 61, in convert_onnx2trt
    build_and_save_engine(encoder_onnx_path, encoder_trt_path)
  File "/home/vanguyen/package/translate/translate/quantization/tensorrt/", line 51, in build_and_save_engine
AttributeError: 'NoneType' object has no attribute 'serialize'

This is the code I use to build the engine.

print('Beginning ONNX file parsing')
    print("Network inputs:")
    for i in range(network.num_inputs):
        tensor = network.get_input(i)
        print(, trt.nptype(tensor.dtype), tensor.shape)
    # allow TensorRT to use up to 1GB of GPU memory for tactic selection
    config.max_workspace_size = common.GiB(16)
    builder.max_batch_size = 4

    # set optimization profile for dynamic shape
    min_length, max_length = 1, 200
    opt_length = int(max_length / 2)
    min_batch_size, max_batch_size = 1, 4
    profile.set_shape('input_ids', (min_batch_size, min_length), (min_batch_size, opt_length),
                      (min_batch_size, max_length))
    profile.set_shape('attention_mask', (min_batch_size, min_length), (min_batch_size, opt_length),
                      (min_batch_size, max_length))

    print('Building an engine...')
    engine = builder.build_engine(network, config) # error is thrown here
    with open(trt_file_path, 'wb') as fout:

These are my system specs.

$ dpkg -l | grep TensorRT
ii  libnvinfer-bin                                              8.0.3-1+cuda11.3                             amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.0.3-1+cuda11.3                             amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.0.3-1+cuda11.3                             all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.0.3-1+cuda11.3                             amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.0.3-1+cuda11.3                             amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.0.3-1+cuda11.3                             all          TensorRT samples
ii  libnvinfer8                                                 8.0.3-1+cuda11.3                             amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.0.3-1+cuda11.3                             amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.0.3-1+cuda11.3                             amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.0.3-1+cuda11.3                             amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.0.3-1+cuda11.3                             amd64        TensorRT parsers libraries
ii  onnx-graphsurgeon                                           8.0.3-1+cuda11.3                             amd64        ONNX GraphSurgeon for TensorRT package
ii  python3-libnvinfer                                          8.0.3-1+cuda11.3                             amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                                      8.0.3-1+cuda11.3                             amd64        Python 3 development package for TensorRT
ii  tensorrt                                                                     amd64        Meta package of TensorRT


Please share with us the issue repro ONNX model and trtexec --verbose logs for better debugging.

Thank you.


This is the onnx model m2m100_418M-encoder.onnx - Google Drive.
I ran trtexec --explicitBatch --onnx=m2m100_418M-encoder.onnx --verbose but it returns Segmentation Fault.

[07/01/2022-20:52:43] [V] [TRT] Registering tensor: input_ids for ONNX tensor: input_ids
[07/01/2022-20:52:43] [V] [TRT] Adding network input: attention_mask with dtype: int32, dimensions: (-1, -1)
[07/01/2022-20:52:43] [V] [TRT] Registering tensor: attention_mask for ONNX tensor: attention_mask
Segmentation fault

However, parsing the ONNX model using parser.parse( in Python code works fine.

I managed to upgrade TensorRT to and the cublas heuristic search problem disappeared.
However, now I am running into another Assertion Error. This is the StackTrace.

[07/02/2022-11:04:10] [TRT] [V] *************** Autotuning format combination: Int32((# 1 (SHAPE input_ids)),1), Int32((# 1 (SHAPE input_ids)),1), Int32(1,1), Int32((# 1 (SHAPE input_ids)),1), Int32(), Int32(), Int32() -> Float((* 1024 (# 1 (SHAPE input_ids))),1024,1) ***************
[07/02/2022-11:04:10] [TRT] [V] --------------- Timing Runner: {ForeignNode[encoder.embed_tokens.weight...Add_1784]} (Myelin)
[07/02/2022-11:04:10] [TRT] [W] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[07/02/2022-11:04:10] [TRT] [W]  (# 1 (SHAPE input_ids))
[07/02/2022-11:04:10] [TRT] [W]  (# 0 (SHAPE input_ids))
[07/02/2022-11:04:13] [TRT] [E] 2: [myelinSliceLayer.cpp::addSlice::110] Error Code 2: Internal Error (Assertion sliceOutDims[i] <= inputDims.d[i] failed. )

I really appreciate the help in troubleshooting this. Thank you so much.


We are unable to reproduce the same error on Tesla V100 GPU and TensorRT version 8.4 GA.
Could you please make sure, you installed the correct version of the TensorRT.

[07/05/2022-06:18:14] [I] TensorRT version: 8.4.1
[07/05/2022-06:18:18] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # /opt/tensorrt/bin/trtexec --explicitBatch --onnx=m2m100_418M-encoder.onnx --verbose

Thank you.

Thank you. I have managed to solve this by reinstalling TensorRT from the TAR file. So closing this discussion.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.