unspecified launch failure problem


I am doing a GPU project. Now, there is a tricky problem. I hope to increase the grid size, and reduce the block size. The reason is the benchmark suite I am testing has not large enough block numbers in their kernel functions.

For example, the block size is 256 threads/block. I shrink it to 128. The grid size will be doubled automatically. However, when I run the new version of CUDA code. It always gives me the error message:

                       CUDA error: unspecified launch failure

I also tried to make block size larger. It has no such error, but the output result is not correct.

Does somebody have the similar experience? How can I manually increase the block numbers in the kernel function without impacting the final output? If I need to modify the raw input data. Because I am told that block size is programmer decided, why I can not change it.

Thank you so much!


Sounds like something in the kernel is hardcoded to assuming block size of 256, which could easily cause a segfault via trying to write 256 items to an array which is sized to 128 for instance. The only thing to do is dig through the kernel code to find the offending line(s) and change them to using the correct block size.