I’m new to cuda programming, and I’m facing something really weird. I have an image with a size of Pz*Px. For each pixel, I need to do some processings. So, first, I define followings:
dim3 block(Px, Pz); dim3 grid(1);
The weird thing is that when Pz and Px are equal to 32, all works fine, but when they are 64 (or even bigger), then the Kernel does not get launched. I thought that maybe the indexing inside the kernel is causing a problem. So, I removed my processing in the kernel and just printf some words. Still, all works fine for Pz=Px=32, but the Kernel does not work (does not even get launched) for Pz=Px=64. Any idea what is going on?