Correct me if im wrong, but clear this one out for me… when calling a device kernel from within a global kernel, i must define a different thread axis (ie. my device kernel uses X threads and my global kernel uses Y threads) and what are the runarounds with synchronizing these? should i integrate the global portion of the kernel into my device kernel and call it instead?
you cannot launch a device function like a kernel (<<< >>>)
kernel functions are always global
you can call a device function from a kernel
If your kernel is too big you can partition it into device functions