Local memory corruption? Strange behavior of local memory on device function

Hello, everybody!

I’ll show the relevant code and state the problem later. The snippet is part of a device function. Several threads are running the code in parallel.

GPUEdge edge0 = d_triEdges[currentTest.x];

GPUEdge edge1 = d_triEdges[currentTest.y];

unsigned int aIdx, bIdx , cIdx , dIdx;

aIdx = edge0.v.x;

bIdx = edge0.v.y;

cIdx = edge1.v.x;

dIdx = edge1.v.y;

float3 a	= d_GL_vertices	[aIdx].v;

float3 av	= d_velocities	[aIdx].v;

float3 b	= d_GL_vertices	[bIdx].v;

float3 bv	= d_velocities	[bIdx].v;

float3 c	= d_GL_vertices	[cIdx].v;

float3 cv	= d_velocities	[cIdx].v;

float3 d	= d_GL_vertices	[dIdx].v;

float3 dv	= d_velocities	[dIdx].v;

float fFirstTimeOfContact = Intersect_EE(a , b , c , d , av , bv , cv , dv , deltaTime);

unsigned int uiFirstTimeOfContact;

floatToSortableInt(fFirstTimeOfContact , uiFirstTimeOfContact);


atomicMin(&d_timeOfContact[aIdx] , uiFirstTimeOfContact);

atomicMin(&d_timeOfContact[bIdx] , uiFirstTimeOfContact);

atomicMin(&d_timeOfContact[cIdx] , uiFirstTimeOfContact);

atomicMin(&d_timeOfContact[dIdx] , uiFirstTimeOfContact);

This code tests whether two moving edges intersect (continuous collision). The edges are defined by their initial positions (d_GL_vertices)and velocities (d_velocities). The function Intersect_EE solves a generated cubic equation that defines the first time of contact of the edges. The atomicMins maintain the minimum time for each vertex.

My problem is that the vertex indices (aIdx, bIdx, cIdx and dIdx) are being corrupted. The corrupted values show after Intersect_EE function returns in a Nsight session. For example, when I debug code for threadIdx(6,0,0) the values for aIdx , bIdx, cIdx and dIdx are 3, 4, 0 and 1 before Intersect_EE and 1048664806, 1048664806, 1050784160, 1050784160 after. Its very weird behavior since they are local variables and the Intersect_EE function receives only const parameters as the next snippet shows.

__device__ __inline__ float

Intersect_EE(const float3 &ta0, const float3 &tb0, const float3 &tc0, const float3 &td0,

			 const float3 &tav, const float3 &tbv, const float3 &tcv, const float3 &tdv,

			 /*float3 &qi ,*/ const float deltaTime)


	/* Default value returned if no collision occurs */

	float collisionTime = -1.0f;


	* Compute scalar coefficients by evaluating dot and cross-products.


	float a, b, c, d; /* cubic polynomial coefficients */

	_equateCubic_EE(ta0, tav, tb0, tbv, tc0, tcv, td0, tdv, a, b, c, d);

	if (IsZero(a) && IsZero(b) && IsZero(c) && IsZero(d))

		return collisionTime;

	NewtonCheckData data;

	data[0] = tav , data[1] = ta0;

	data[2] = tbv , data[3] = tb0;

	data[4] = tcv , data[5] = tc0;

	data[6] = tdv , data[7] = td0;


	* iteratively solve the cubic (scalar) equation and test for validity of the solution.


	float l = 0;

	float r = deltaTime;

	float coeffs[4];

	coeffs[0] = a, coeffs[1] = b, coeffs[2] = c, coeffs[3] = d; //changed the coeffs order

	float3 pab;

	if (solveCubic(l, r, false, data, coeffs)) {

		collisionTime = (l+r)*0.5f;	


	return collisionTime;


The configuration I’m using is:

SO: Windows 7

CUDA Toolkit and SDK 4.0

Visual Studio 2010

Thanks in advance.

I’d be curious how the interface to _equateCubic_EE is defined since that first arg, ta0, is an alias of a in the main routine, which is declared next to the vars that are being clobbered. I could imagine the args to equateCubic_EE being declared as pointers instead of refs, and some invalid pointer arithmetic causing your corruption. According to wikipedia, your const declarations mainly say what should happen but aren’t a guarantee, so potentially a called routine could modify those objects. I imagine that varies from one compiler to the next.