Sharing CUDA memory by numba

I try to share pytorch tensor by numba like this and it gives me strange results.
when I read the headline with numba, I get an incorrect result from the network from time to time. I added a delay after converting the tensor and it started working correctly. And I not understand why it happens

Code sample:

backbone_features = torch.cat(backbone_features_list, dim=0)
desc = backbone_features.__cuda_array_interface__
shape = desc["shape"]
strides = desc.get("strides")
dtype = np.dtype(desc["typestr"])
shape, strides, dtype = _prepare_shape_strides_dtype(shape, strides, dtype, order="C")
size = cuda.driver.memory_size_from_info(shape, strides, dtype.itemsize)
devptr = cuda.driver.get_devptr_for_active_ctx(
            backbone_features.__cuda_array_interface__["data"][0])
data = cuda.driver.MemoryPointer(
            current_context(), devptr, size=size, owner=backbone_features)
ipch = devices.get_context().get_ipc_handle(data)
desc = dict(shape=shape, strides=strides, dtype=dtype)
handle = pickle.dumps([ipch, desc])

Can it happen when new data is loaded before current data was read?

I also try base example with numba.cuda.api:

        arr = cuda.to_device(tensor)
        handle = arr.get_ipc_handle()
        handle = pickle.dumps(handle)

and

with handle as ipc_array:
            z = cuda.open_ipc_array(ipc_array, 1, dtype='float16', strides=None, offset=0)
            hary = z.args[0].copy_to_host(stream=cuda.stream())