TensorRT error: Cuda Runtime (invalid resource handle), Torchaudio on GPU & trt engine，get wrong result

Jeff_f · January 6, 2022, 10:26am

Description

a simple audio classifier model. First extracts Mel spectrogram with torchaudio on GPU. Second do the model inference on the same GPU, but get the wrong result.
it is strange that if I extract the Mel spectrogram on the CPU and inference on GPU, the result is correct.

HERE is my code:

def wav_to_frames(wave_data, win_len=int(16000 * 6.5)):
 
    num_frames = round(len(wave_data) / win_len)
    frame_len, wave_len = num_frames * win_len, len(wave_data)
    if frame_len > wave_len:
        x = F.pad(wave_data, (0, frame_len - wave_len))
    elif frame_len < wave_len:
        x = wave_data[:frame_len]
    else:
        x = wave_data

    return x.view(-1, 1, win_len)

def torchaudio_extract(waveform):
   
    torchaudio_melspec = torchaudio.transforms.MelSpectrogram(
        sample_rate=16000,
        n_fft=512,
        win_length=512,
        hop_length=160,
        center=True,
        pad_mode="reflect",
        power=2.0,
        norm='slaney',
        onesided=True,
        n_mels=64,
    ).to(torch.device('cuda'))(waveform)

    return torchaudio_melspec.transpose(1, 2)


classTRTGPUdev():
    def __init__(self, model_path, onnx_path=None):
     
        self.TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
        self.engine = self.get_engine(model_path)
        self.context = self.engine.create_execution_context()
        self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()

    def allocate_buffers(self):
        inputs, outputs, bindings = [], [], []
        stream = cuda.Stream()
        for binding in self.engine:
            size = trt.volume(self.engine.get_binding_shape(binding)) * self.engine.max_batch_size
            trt_dtype = trt.nptype(self.engine.get_binding_dtype(binding))
            host_mem = cuda.pagelocked_empty(size, trt_dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            bindings.append(int(device_mem))
            if self.engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
            else:
                outputs.append(HostDeviceMem(host_mem, device_mem))
        return inputs, outputs, bindings, stream

    def get_engine(self, trt_path):
        with open(trt_path, "rb") as f, trt.Runtime(self.TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())

    def do_inference_v2(self):
        [cuda.memcpy_htod_async(inp.device, inp.host, self.stream) for inp in self.inputs]
        self.context.execute_async_v2(bindings=self.bindings, stream_handle=self.stream.handle)
        [cuda.memcpy_dtoh_async(out.host, out.device, self.stream) for out in self.outputs]
        self.stream.synchronize()
        return [out.host for out in self.outputs]

    def trt_engine(self, audio_path):
       # 1. move data to GPU
        wave_data, sr = torchaudio.load(audio_path)
        wave_data = wave_data.to(torch.device('cuda'))
        wavs = wav_to_frames(wave_data[0], int(6.5 * 16000))
       # 2. extract features with torchaudio on GPU
        feats = [torchaudio_extract(i).reshape(1, 1, 651, 64) for i in waves]

       # 3. do inference with trt
        result, msg = [], []
        for index, data in enumerate(feats):
            feed_data = data.cpu().detach().numpy()
            self.inputs[0].host = feed_data
            trt_outputs = self.do_inference_v2()

            if trt_outputs[0][1] > 0.8 and index < 6:
                msg.append(self.time_tagging[index])
            result.append(copy.deepcopy(trt_outputs[0]))
        return result

result :


[01/06/2022-17:51:41] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[01/06/2022-17:51:42] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
[01/06/2022-17:51:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[01/06/2022-17:51:42] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
[01/06/2022-17:51:48] [TRT] [E] 1: [scaleRunner.cpp::execute::144] Error Code 1: Cuda Runtime (invalid resource handle)
[01/06/2022-17:51:48] [TRT] [E] 1: [scaleRunner.cpp::execute::144] Error Code 1: Cuda Runtime (invalid resource handle)
[01/06/2022-17:51:48] [TRT] [E] 1: [scaleRunner.cpp::execute::144] Error Code 1: Cuda Runtime (invalid resource handle)

trt:    [array([0., 0.], dtype=float32), array([0., 0.], dtype=float32), array([0., 0.], dtype=float32)] []

BUT when I extract feature on CPU, just change the code .to(torch.device(‘cuda’)) to .to(torch.device(‘cpu’)) will get the correct result as the follow:

[01/06/2022-18:18:46] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[01/06/2022-18:18:46] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
[01/06/2022-18:18:46] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
[01/06/2022-18:18:46] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0

trt:    [array([9.9998915e-01, 1.1265278e-05], dtype=float32), array([0.9989116 , 0.00110829], dtype=float32), array([9.9990976e-01, 8.9466572e-05], dtype=float32)] []

Environment

TensorRT Version: 8.2.1
GPU Type: Tesla P100-PCIE-16GB
Nvidia Driver Version: 450.80.02
CUDA Version: 11.0
CUDNN Version:
Operating System + Version: Centos
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.10 , torchaudio:0.10.1

So, What went wrong or how should I fix it?

spolisetty · January 7, 2022, 7:15am

Hi,

Looks like CUDA context issue. Could you please share us complete script and if possible issue repro resources for better debugging.

Are you using pytorch and pycuda simultaneously ?

Thank you.

Jeff_f · January 7, 2022, 9:52am

demo.trt (14.4 MB)
trt_inference.py (5.7 KB)

Hi，
I really appreciate your reply, the attachment is my code,I’m using pytorch and pycuda simultaneously, I have no idea about the problem if I use them together.
It bothered me for a long time. I’m looking forward to your reply.

Best.

spolisetty · January 10, 2022, 5:14am

Hi,

I think you need to avoid using PyTorch-GPU and PyCUDA together. Instead of making allocations with PyCUDA, we can use torch tensors directly with TRT (specifically, we can use the data_ptr() method to get the device memory address:
https://pytorch.org/docs/stable/generated/torch.Tensor.data_ptr.html

Please refer following issue for more details,
https://github.com/NVIDIA/TensorRT/issues/1133#issuecomment-809509799

Thank you.

Topic		Replies	Views
Cuda Runtime (invalid resource handle) when use TensorRT and Pytorch(on GPU) simultaneously TensorRT	5	3134	December 17, 2024
[TensorRT] ERROR: 1: [reformat.cu::NCHWToNCHHW2::1038] Error Code 1: Cuda Runtime (invalid resource handle) Jetson Nano tensorrt	5	5724	November 24, 2021
Invalid resource handle when doing inference with TensorRT TensorRT	3	2762	August 10, 2022
"Cuda Error in NCHWTONCHHW2: 33 (invalid resource handle) "，How to solve it? Jetson Nano cuda	30	6604	October 18, 2021
cuda error when using TensorRT5 for model inference and using tensorflow for data preprocessing TensorRT	2	976	March 9, 2019
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1766	December 4, 2019
[TensorRT] ERROR: ../rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle) TensorRT tensorrt , cuda	3	2317	March 16, 2021
TensorRT ERROR: pointWiseV2Helpers.h::launchPwgenKernel::532 Cuda Driver (invalid resource handle) Jetson Xavier NX tensorrt , cuda , jetson-inference	3	2138	March 24, 2022
[genericReformat.cuh::copyPackedRunKernel::1487] Error Code 1: Cuda Runtime (invalid resource handle) TensorRT	0	177	August 19, 2024
Cupy and TensorRT TensorRT	2	2149	December 16, 2020

TensorRT error: Cuda Runtime (invalid resource handle), Torchaudio on GPU & trt engine，get wrong result

Description

Environment

Related topics