This is my first time posting to this forum. So be gentle :) .
GPU: 8800 GTX.
I am writing a kernel function to perform wavelet transformation via lifting. I have to use multiple __syncthreads() (one after each lifting step).
When I run it in emulation mode and with gdb, I am finding that the code up to the first __syncthreads() gets executed. After that, the program exits the kernel function without executing remaining lines of code (including additional __syncthreads() statements). I have verified this in the debugger as well as by inserting some printf statements after the first __syncthreads() statement.
The __syncthreads() statement is NOT within any kind of conditional loop, although there are conditional statements preceding __syncthreads().
Any help will be greatly appreciated!