Cuda Error

Description

here is my problem, when i get yolov4 detect, i used GPU to decode the video stream to frame, then put the frame to tensorRT to inference, here came the promble which like this

inference elasped time:0.5551ms
post elasped time:0.0714ms
pre elasped time:1.0675ms
ERROR: C:\source\rtSafe\cuda\cudaElementWiseRunner.cpp (164) - Cuda Error in nvinfer1::rt::cuda::ElementWiseRunner::execute: 400 (invalid resource handle)
ERROR: FAILED_EXECUTION: Unknown exception
inference elasped time:0.5422ms
post elasped time:0.0723ms
pre elasped time:1.0797ms
ERROR: C:\source\rtSafe\cuda\cudaElementWiseRunner.cpp (164) - Cuda Error in nvinfer1::rt::cuda::ElementWiseRunner::execute: 400 (invalid resource handle)
ERROR: FAILED_EXECUTION: Unknown exception

but when i use the cpu to cap the frame, where it is ok ,can work, i search baidu,where some guy tell me ,the gpu should init once, but i tried, it didnt work

Environment

TensorRT Version: TensorRT-7.1.3.4
GPU Type: 1080 8G
Nvidia Driver Version: 451
CUDA Version: 11.0
CUDNN Version: cudnn-11.0-windows-x64-v8.0.1.13
Operating System + Version: win10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @xcf1996,
Can you please try the same using latest TRT release.
Also, Cuda error is basically because of inappropriate CUDA driver.
Recommend you to check the support matrix for the same.

Thanks!

nope,using latest TensorRT didn`t make it,since problem came from here

void Yolo::doInference(const unsigned char* input, const uint32_t batchSize)
{
//Timer timer;
assert(batchSize <= m_BatchSize && “Image batch size exceeds TRT engines batch size”);
NV_CUDA_CHECK(cudaMemcpyAsync(m_DeviceBuffers.at(m_InputBindingIndex), input,
batchSize * m_InputSize * sizeof(float), cudaMemcpyHostToDevice,
m_CudaStream));
std::mutex mtx;
mtx.lock();
std::cout << "加了一个锁… " << std::endl;
assert(m_Context != nullptr);
std::cout << "m_Context 不为空… " << std::endl;
if (!m_Context->enqueue(batchSize, m_DeviceBuffers.data(), m_CudaStream, nullptr))
std::cout << “入队列有问题,需要排查” << std::endl;
//m_Context->enqueue(batchSize, m_DeviceBuffers.data(), m_CudaStream, nullptr);
mtx.unlock();
for (auto& tensor : m_OutputTensors)
{
NV_CUDA_CHECK(cudaMemcpyAsync(tensor.hostBuffer, m_DeviceBuffers.at(tensor.bindingIndex),
batchSize * tensor.volume * sizeof(float),
cudaMemcpyDeviceToHost, m_CudaStream));
}
cudaStreamSynchronize(m_CudaStream);
//timer.out(“inference”);
}

the input is ok,but when it goto here
m_Context->enqueue(batchSize, m_DeviceBuffers.data(), m_CudaStream, nullptr)
it return 0??,do not make sense,the enqueue is a dll file which i can`t debug ,maybe m_CudaStream out of sync,or maybe just disapper,because i need use GPU to decode frame, i am confused,