how to convert data type from cuda.mem_alloc object to pytorch tensor object without copying?

I want to speed up the part of faster-rcnn-fpn, which is extractor of feature map. the feature map size is large. and I get the output of tensorrt which is mem_alloc object, but I need pytorch tensor object. I try to convert mem_alloc object to pytorch tensor, but it spend too much time in memcpy from gpu to cpu. how to convert data type from cuda.mem_alloc object to pytorch tensor object without copying?

my code:
binding = [int(d_input), int(d_output[0]), int(d_output[1]), int(d_output[2]), int(d_output[3])]
cuda.memcpy_htod_async(d_input, input_data_tensor.data.cpu().numpy().astype(NPDTYPE), stream)
context.execute(1, binding)
cuda.memcpy_dtoh_async(output1, d_output[0], stream)
cuda.memcpy_dtoh_async(output2, d_output[1], stream)
cuda.memcpy_dtoh_async(output3, d_output[2], stream)
cuda.memcpy_dtoh_async(output4, d_output[3], stream)
stream.synchronize()

ou1 = torch.tensor(output1, device="cuda")
ou2 = torch.tensor(output2, device="cuda")
ou3 = torch.tensor(output3, device="cuda")
ou4 = torch.tensor(output4, device="cuda")