HyperQ - not changing any existing codebase ?

I read on the Kepler whitepaper 110 this:

quote for HyperQ:
Applicationsthat previously
encountered false serialization acrosstasks,thereby limiting achievedGPUutilization, can see
up to dramatic performance increase without changing any existing code.

So far, I have only seen examples of utilizing HyperQ with dramatically changing code, to host parallell tasks in CPU memory within the codebase seems complex especially when I have to keep all running code within the same .cu file.

How can HyperQ be utilized without changing any existing code ?
I havent found anyway of parallelling tasks with the Nvidia API…

I havent found anyway of parallelling tasks with the Nvidia API…

meaning wihtout changing any existin code.

what I need to do is this:
CPU code
GPU code part 1 FFT
CPU code to check result incl memcopy to host
GPU code part 2
CPU memcpy to host - check for iterations to exit cpu thread

I cannot run 32 tasks on the GPU code part 1 itself as the part 2 has to happen.

so the only way to fully utilize HyperQ is to use API some way to make a container above this code to parallellize these tasks outside of this scope, so the GPU can be used when the CPU code is running with a different dataset.

So far I have to dramatically change the code to do this, atleast with the HyperQ sample I saw in Cuda 5.0.

Mabye Im missing something.

I tried to set the environmental flag to run 2 consoles, but that didnt work.
I guess the titan card doesnt accept 2 separate cpu applications accessing same gpu in hyperq mode ?

I found some info here.
https://devtalk.nvidia.com/default/topic/529136/hyperq-and-mpi/