Thanks. I tried your suggestion, modified source below. Now it won’t debug with error “CUDA Dynamic Parallelism debugging is not supported in preemption mode. Breakpoints will be disabled.”. This is on Windows 10 under CUDA 8.0 on both Pascal and Maxwell GPUs. I don’t have a headless gpu to test with but unclear why calling cudaDeviceSyncronize would require dynamic parallelism. It happens on two different machines so probably not specific to my set up.
__global__ void Test()
cudaError_t err = cudaDeviceSynchronize();
err = cudaMalloc(&ptr, 100);
Test << < 1, 1 >> > ();