No support for concurrent kernels on the K1?

The cudaDeviceProps::concurrentKernels property is ‘1’ but the ‘concurrentKernels’ Sample always serializes kernel execution.

Is this a temporary issue?

Perhaps this is related to the fact that the K1 has only one multi-processor (SMX).
And/or it does not have the resources (registers,shared mem,local parameters) to run more than one of these kernels simultaneously, you’ll have to check the occupancy.


The “Concurrent Kernels” sample launches very tiny low-resource kernels so that’s not the problem.

The same example runs on the GK208 (~2x the resources of a K1) without problems.

I filed a bug (#1526164).