I have a kernel which uses shared memory to do a reduction. When I enter debugging to view memory locations, the shared memory pointer I am looking for isn’t in the locals list. If I right click its name to try to add it to the watch list, Visual Studio stops responding immediately and eventually goes non-responding. If I hang around long enough in that kernel VS will also stop responding.

code in question:

```
void summation(keyEntry* matrix, keyEntry* rowSums, int* incGuide, const int reduction_size, const int matrix_size, const int reductions, const int rows, const int threads)
{
device_summation << < rows, threads, reduction_size * sizeof(keyEntry) >>> (matrix, rowSums, incGuide, reduction_size, matrix_size, reductions);
}
__global__ void device_summation(keyEntry* matrix, keyEntry* rowSums, int* incGuide, const int reduction_size, const int matrix_size, const int reductions)
{
extern __shared__ keyEntry reductionData[];
//transfer from global to shared memory in two halves
//for the second half, pad with zeroes where neccessary.
reductionData[threadIdx.x] = matrix[blockIdx.x * (matrix_size)+threadIdx.x];
reductionData[threadIdx.x + reduction_size / 2] = (matrix_size-1 < threadIdx.x + reduction_size / 2) ? 0 : matrix[blockIdx.x * (matrix_size)+threadIdx.x + reduction_size / 2];
__syncthreads();
for (int i = 0; i < reductions; i++)
{
if (threadIdx.x < incGuide[i]) reductionData[threadIdx.x] += reductionData[threadIdx.x + incGuide[i]];
__syncthreads();
}
if (threadIdx.x == 0) rowSums[blockIdx.x] = reductionData[0];
}
```

I’ve ran this program through memcheck and didn’t get any errors from this kernel. I’ve read other people having problems like because the shared memory was optimized away - does this look likely here?

This code seems to work most of the time so *something* is being created to hold reductionData, although it fails with certain inputs.

full code available on github if needed.