How to implement a kernel function for a custom plugin's enqueue?

Hello there,

I try to optimize a TensorFlow model with TensorRT 5. Based on the official sample code and the documentation, I am sure my initial steps are implemented correctly - Creating UFF file in TF, parsing it, building engine, deserialization and serialization.

I am currently struggling with the inference.
In order to understand how to process the input of a custom layer, I try to implement a simple kernel function that maps the input to the output. I wrote a wrapper that calls my kernel:

in enqueue:

int FCPlugin::enqueue(int batchSize, const void*const * inputs, void** outputs, void* workspace, cudaStream_t stream)
  Wrapper::wrapper(const_cast<float*>((const float*)inputs), (float*)outputs);


__global__ void pass(float *input, float *output, int maxidx)
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < maxidx) {
    output[i] = input[i];

namespace Wrapper {
  void wrapper(float* inputs, float* outputs)
    pass<<<1, 10>>>(inputs, outputs, 10);
    printf("Kernel_call: pass through\n");

However, during inference the error: “Cuda failure: 77” occurs.

So my final question is:
Do I have to allocate GPU memory beforehand, or prepare otherwise? Can someone please provide a simple sample code that performs the action?

P.S. I know that TensorRT 6 is released and it is recommended to use it, but for my purpose I have to use TensorRT5. I also read the respective parts in the official documentation, so I think my general workflow should be fine by now.

Thanks in advance.

Could you please let us know if you are still facing this issue?


I already solved the issue.