CUDA program listening to some port (TCP, MPI...?) with all data loaded in GPU memory, ready to react


I would like to create a server application able to load all my data in the GPU (will be almost the whole GPU memory, ~12GB), keep it there, and then to wait while listening through a TCP port, MPI… (whatever is possible). I want to avoid the transfer of CPU to GPU for every query, since all of them will use the same information.

As you can see, this system would be a GPU-accelerated search engine, where I need a lot of speed on a subset of my data.

What technology, strategy, etc. could I use for this purpose?