CUDA Kernel does not do anything

HI all,
I’m new to cuda programming, and I’m facing something really weird. I have an image with a size of Pz*Px. For each pixel, I need to do some processings. So, first, I define followings:

dim3 block(Px, Pz);
dim3 grid(1);

The weird thing is that when Pz and Px are equal to 32, all works fine, but when they are 64 (or even bigger), then the Kernel does not get launched. I thought that maybe the indexing inside the kernel is causing a problem. So, I removed my processing in the kernel and just printf some words. Still, all works fine for Pz=Px=32, but the Kernel does not work (does not even get launched) for Pz=Px=64. Any idea what is going on?



Add proper CUDA error checking. If you don’t know what that is, Google for exactly those terms. Also, run your executable under the control of cuda-memcheck and address all issues it reports.

Note that kernel-side printf() prints into a transfer buffer for output by the host. Make sure to flush the buffer with a call to cudaDeviceSynchronize() prior to terminating your program.

CUDA has limits to the total number of threads per block, which is the product of the individual dimensions. This limit is 1024, so 32*32 = 1024 and won’t cause a problem. Larger numbers there will.

The proper error checking will tend to draw your attention to the problem.

Thank you for your help. The problem is solved now. I have faced another problem though:

i have created a visual studio project and now all works fine. I have a .exe file which takes some parameters as input and do the same computation in CPU and GPU for comparison. While this .exe file works on my personal laptop, the GPU part does not work on my office PC (the CPU part works well, but when the program starts the GPU, it crashes). what is the problem here? this .exe file should work on every PC with GPU, right?


Not necessarily. What is the GPU in your personal laptop? What is the GPU in your office PC? What is the exact nvcc commandline you used to build your application (if you use the MSVS IDE, you will probably need to turn on verbose logging to extract that information)?

My laptop GPU is GTX950M, and my office GPU is rtx 2080 ti.
regarding the exact nvcc command line, I use visual studio 2019 to build my application. the attachment contains the project log when I build the application. TUI_CUDA.log (10.9 KB)
Is what you are looking for there?

It probably would be a good idea to go into the project settings and add the device settings appropriate for your 2080 Ti (compute_75,sm_75) to the build. This topic is covered extensively elsewhere and the setting is easy to find in the project settings specific to CUDA, please just google for help here.

I’m not sure that is the problem, however. I believe the settings you have now, which do target the 950, should also allow your code to be runnable on the 2080. These types of descriptions:

While this .exe file works on my personal laptop, the GPU part does not work on my office PC (the CPU part works well, but when the program starts the GPU, it crashes)

are really not that useful for others trying to help you,. Crash is an indeterminate, overloaded term. It’s probably better just to show the console output that comes from running your program in the failing case. Also, if you have not already done so, I strongly encourage you to add rigorous proper CUDA error checking to your code, if you have not already done so. If you have already done so, and you are convinced that the problem lies in the GPU side in the 2080 case, the the error checking should give a nice crisp description of what the CUDA runtime thinks the problem is, as opposed to “crash”