Simple Kernel takes too long to accomplish, using GpuMat, possible mistake in memory accessing

please delete, double post