LKM and Cuda Linux Kernel Module

Hi,

I’ve been reading a large amount of posts on multiple forums about GPU “interacting” with the Linux kernel, and I have
seen a lot of different answers (especially regarding KGPU).

My problem is the following, I receive a large amount of data in the kernel, let’s say around 5GB/s are generated.
The data is composed of multiple “big” chunk that I would like to analyse with a GPU.

Since GPUs only work in user-land, I’m wondering if one of you came up with an efficient way to transfer
data from the kernel-land to user-land or basically to CUDA and avoid copy.

(I’ve been reading a lot about that subject but my readings where not applied to CUDA, I have read about
sockets, relayfs, etc. so I have a good idea about how to transfer data from the kernel to the user-land, but
as I said those are definitely not avoiding “copy” and are not “adapted” to CUDA)

Thank you.