The runtime API makes it really easy to launch kernels with that <<<…>>> construct when those kernels are included in your .cu file. But I’m forced to use CUstream objects from the driver API, and these can’t be used with the <<<…>>> construct.
Is there a way to use the driver API to launch a kernel that’s right there in my .cu file without going through the hassle of loading it via an external cubin? Or anyone know a way to re-package those CUstream objects as cudaStream_t (which driver_types.h says are ‘int’)?
If the runtime is simply masking this whole process, then how does nvcc nicely hide the .cubin code in the .o files. I’d rather not have to mess with external cubins.