[E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

forein · July 6, 2022, 7:07am

Description

I used the two patchs provide by NVIDIA official website for cuda 10.2, but it only works for model converting from onnx to trt, and this issue is still occured when evaluting with TensorRT. Even I try to execute its own python program(network_api_pytorch_mnist), and this issue will be occured after two epochs. It works with C++ program but does not with python.

Environment

TensorRT Version: 8.4.1.5
GPU Type: Tesla V100
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 8.4.1
Operating System + Version: 18.04
Python Version (if applicable): 3.7.10
PyTorch Version (if applicable): 1.8.1
Baremetal or Container (if container which image + tag): Docker 20.10.7

Relevant Files

Program here:
import torch
import torchvision.models as models
import os
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import time

BATCH_SIZE = 32
USE_FP16 = True
resnext50 = models.resnext50_32x4d(num_classes=10)
dummy_input = torch.randn([BATCH_SIZE, 3, 224, 224], dtype=torch.float16)
resnext50.half()
resnext50, dummy_input = resnext50.cuda(), dummy_input.cuda()
torch.onnx.export(resnext50, dummy_input, ‘resnext50.onnx’, verbose=False)
os.system(r’./trtexec --onnx=resnext50.onnx --saveEngine=resnext50.trt --explicitBatch=32 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16’)

target_dtype = np.float16 if USE_FP16 else np.float32
f = open(“resnext50.trt”, “rb”)
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

input_batch = np.random.randn(BATCH_SIZE, 224, 224, 3).astype(target_dtype)
output = np.empty([BATCH_SIZE, 10], dtype = target_dtype)
d_input = cuda.mem_alloc(1 * input_batch.nbytes)
d_output = cuda.mem_alloc(1 * output.nbytes)
bindings = [int(d_input), int(d_output)]

stream = cuda.Stream()
preprocessed_inputs = np.array([input.transpose([2, 0, 1]) for input in input_batch]) # (BATCH_SIZE,224,224,3)——>(BATCH_SIZE,3,224,224)

for i in range(1000):
t0 = time.time()
cuda.memcpy_htod_async(d_input, preprocessed_inputs, stream)
# context.execute_async_v2(bindings, stream.handle, None)
# context.execute_async(BATCH_SIZE, bindings, stream.handle)
context.execute_v2(bindings)
cuda.memcpy_dtoh_async(output, d_output, stream)
stream.synchronize()
t = time.time() - t0
print(“\rPrediction cost {:.4f}s”.format(t), end=‘’)
print(output[0])

spolisetty · July 8, 2022, 1:10pm

Hi,

Enabling cuBLAS tactic may help you.
Please refer,

https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html?highlight=tactic_sources#tensorrt.TacticSource

https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html?highlight=tactic_sources#tensorrt.IBuilderConfig.set_tactic_sources

Also, it’s better to avoid using PyTorch-GPU and PyCUDA together. Instead of making allocations with PyCUDA, we can use torch tensors directly with TRT (specifically, we can use the data_ptr() method to get the device memory address: torch.Tensor.data_ptr — PyTorch 1.12 documentation)

Thank you.

Topic		Replies	Views
[TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS ../rtSafe/cublas/cublasLtWrapper.cpp:279 TensorRT tensorrt	1	1985	March 29, 2021
[TensorRT] ERROR: 2: [ltWrapper.cpp::setupHeuristic::327] TensorRT	4	2027	June 17, 2022
TensorRT 8 convert UNET ERROR TensorRT	5	1949	October 12, 2021
Issue with Inferencing with TensorRT on Python TensorRT	7	1178	July 20, 2022
Error Code 1: Myelin (No results returned from cublas heuristic search) #2115 TensorRT	6	1686	July 6, 2022
Tensorrt error: CUBLAS_STATUS_EXECUTION_FAILED TensorRT tensorrt	3	1130	May 24, 2022
Some Error when trying to TensorRT engine in Python subprocess TensorRT	2	1041	April 9, 2020
caffe model convert to tensorrt error TensorRT	0	390	June 13, 2018
CUDA cask failure at execution for trt_volta_scudnn_128x32_relu_small_nn_v1 TensorRT	3	1786	October 12, 2021
Cuda Error in execute tensorrt GPU-Accelerated Libraries	1	1214	December 18, 2017

[E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

Description

Environment

Relevant Files

Related topics