Moving nvstd::function object (with __device___) to another class

Hello,
I want to move nvstd::function-objects to another other class, where the nvstd::function-objects should be executed. I wrote an example, that my problem becomes more clear:

[Code-Description:]
The GPUClass process the nvstd::function-object by using the global kernel-method to call the nvstd::function-object on the GPU/device. In main, I create a nvstd::function-object with a lambda expression and give it into the GPUClass-object.

[Code:]

 template<class F>
_global_
void kernel(F f) {
  f();
}

class GPUClass {
	public:
	template<class K>
	void processLambda(K func_gpu){
		kernel<<<1,1>>>(func_gpu);
	}
};

int main() {
	nvstd::function<void(void)> func = [=]__device__(){printf("On GPU!\n");};
	
	GPUClass* gpuclass = new GPUClass();
	gpuclass->processLambda(func);
	return 0;
}

[Problem:]
The Example doesnt output “On GPU!\n”. I think, its because the first line in main():

nvstd::function<void(void)> func = [=]__device__(){printf(“On GPU!\n”);};

The nvstd::function has no __device__ part and because of that the function is managed as a host-function and cant processed at the device. But when I add the __device__ part, than I get the following error-message:

error: an automatic __device__ variable declaration is not allowed inside a host function body

[Question:]
Is there a way that I can move a lambda expression to another class, where this lambda expression get processed at the device?

Best Greetings,
Tobi

from here:

Instances of nvstd::function in host code cannot be initialized with the address of a __device__ function or with a functor whose operator() is a __device__ function.

and:

nvstd::function instances cannot be passed from host code to device code (and vice versa) at run time.

This works:

$ cat t122.cu
#include <cstdio>
template<class F>
__global__
void kernel(F f) {
  f();
}

class GPUClass {
        public:
        template<class K>
        void processLambda(K func_gpu){
                kernel<<<1,1>>>(func_gpu);
        }
};

int main() {
        auto func = [=]__device__(){printf("On GPU!\n");};

        GPUClass* gpuclass = new GPUClass();
        gpuclass->processLambda(func);
        cudaDeviceSynchronize();
        return 0;
}
$ nvcc -std=c++11 t122.cu -o t122 --extended-lambda
$ ./t122
On GPU!
$

Note that I also added a cudaDeviceSynchronize(), which is necessary.

Thank you so much. That helps a lot!