NSight 5.5/CUDA 9.1 Weird Performance

I am using NSight 5.5 with CUDA 9.1 and VS 2015 C++ on a Win10 platform.

In attempting to debug my code, I am seeing what appears to be rather random responses.

__syncthreads();
if (tidx == 0)
{
	cyclecount = 0;
	globalend = true;
}
__syncthreads();

// calculate the perimeter
while ((cyclecount*bdim) < tosumElements)
{
	offset = tidx + (cyclecount*bdim);
	if (offset < tosumElements)
	{
		if ((offset >= 0) && (offset < tosumElements))
		{
			// calculate the point to point perimeter
			if (ovalid[offset + tocumElements])
			{
				tmpoffset = orightneighbornum[offset + tocumElements];
				tbool = ovalid[tmpoffset];

				if(tbool)
					operim[offset + tocumElements] = distance(sxvals[offset + tocumElements], syvals[offset + tocumElements], sxvals[tmpoffset + tocumElements], syvals[tmpoffset + tocumElements]);
				else
					operim[offset + tocumElements] = distance(sxvals[offset + tocumElements], syvals[offset + tocumElements], sxvals[tmpoffset + tocumElements], syvals[tmpoffset + tocumElements]);
			}
			else
				operim[offset + tocumElements] = 0;
		}
	}
	__syncthreads();

	if (tidx == 0)
		cyclecount++;

	__syncthreads();
}
__syncthreads();

In this code, I put a breakpoint if tbool is not true (which should never be the case). When I run the code, the breakpoint is hit. However, when I look at the value of tbool it is true. It can’t be both, either tbool is true (and the breakpoint is not hit) or tbool is false and the breakpoint is hit. For some reason, NSight is telling me both that tbool is true and the breakpoint is hit.

I have checked a few other things. When the breakpoint is hit, tidx = 0. Also, I checked to see of there was a memory leak or memory not allocated and neither of those appear to be the case.

Amy help tracking this down, appreciated.

Strange! I’m receiving a very similar error response upon debugging my code. Have you been able to fix it yet?

the code snippet you posted is too short. It does not contain a number of critical variable declarations. In particular it is not clear which variables are shared and which are not.

Kernel launch parameters, such as grid and block dimensions are not clear either.

Best to post compilable, minimal code that exhibits the problem.

Based on the codes he sent is there any possibilities to figure out any possible solution for I’ve been encountering pretty much same issue and couldnt save any log files.

In attempting to make the smallest possible code segment that reproduces this error, I determined what triggers it but there does appear to be a bug in NSight itself.

The code segment listed above called my own function called “distance” (shown below), that function in turn called “sqrt.” Thinking that there might be some sort of naming conflict with the term “distance”, I renamed my function “MyDistance” and that had no effect. Then, thinking that the sqrt call might have a problem (because I am calling it with ints), I cast those inputs to float, again with no effect.

Ultimately, what I determined is that if this function is in any possible path of execution (even if it is not actually called), it causes an error and NSight hits breakpoints that simply cannot be hit. So, it appears to be an NSight problem.

In the main code, I replaced the calls to the distance function with the actual distance code (unchanged) and that eliminated the NSight error and the code ran fine.

However, I have another kernel I am writing and I am seeing the same behavior by NSight. That is, breakpoints are being hit that simply can’t be and the underlying code appears to be correct. So, again, I suspect that this is an NSight error.

__device__ double distance(int p1x, int p1y, int p2x, int p2y)
{
	float tmpx, tmpy;

	tmpx = (p1x - p2x) * (p1x - p2x);
	tmpy = (p1y - p2y) * (p1y - p2y);

	return(sqrt(tmpx + tmpy));
}