I need an example for Hyper-Q to run only one kernel concurrently using streams and also partion the data

I am newly with Hyper-Q technology. I have a project,with only one kernel. My need is to partition the data and run it concurrently using OpenMP or any other technology related to Hyper-Q. Most examples in “Professional Cuda C progrmming book” run a set of kernels concurrently without taking consideration of dividing the data. Can anyone helps me?

this training session will present such a code, both in the training session itself (slide 11) as well as the homework. There is also a recording of the session.

Thanks a lot Robert