Nsight hangs while final program executes correctly

Hi everyone, I am new to the CUDA paradigm and am excited about its implications in the realm of computer vision and beyond. I am encountering an error that I cannot pinpoint the cause of, one which allows my program to compile and execute correctly, while causes the Nsight 4 for Visual Studio debugger to choke. When it does so, it stops progressing after the gaussConvolve kernel call in the fillScaleSpace function, while sometimes allowing it to execute twice (I observed this behavior through printf instrumentation). The debugger does not raise exceptions of any kind, and there is no information present in the Output or Locals windows.

My intuition tells me it has something to do with the fashion in which I have arranged the kernel calls, perhaps due to nested/dynamic fashion of the dimension parameters being passed to them. However, I still cannot reason why Nsight would encounter this error if the program correctly executes.

Thanks for your time, any help would be appreciated.

void fillScaleSpace(uchar**** scaleSpace, uint** octaveDims, uint octaves, uint scales)
{
	dim3 blockDims(16, 16, 1);

	for(uint oct = 0; oct < octaves; oct++)
	{
		dim3 gridDims((uint) ceil((double)((octaveDims[oct][0])/blockDims.x)), (uint) ceil((double)((octaveDims[oct][1])/blockDims.y)), 1);
		for(uint scl = 0; scl < (scales-1); scl++)
		{
			gaussConvolve<<<gridDims, blockDims>>>(scaleSpace, oct, scl, scl+1, octaveDims[oct][0], octaveDims[oct][1]);
			cudaDeviceSynchronize();
			if ((scl == 2) && (oct < (octaves-1)))
			{
				halfSize<<<gridDims, blockDims>>>(scaleSpace, octaveDims, oct);
				cudaDeviceSynchronize();
			}
		}
	}
}