unknown error while accesing to float3 after converting from int3

Hello,

I am getting unknown error in function domainDecomposition, at line where I execute calcClusterHashD kernel, when I comment out assignment to cgp variable (in calcClusterGridPos function) or change it to assignment to literals like: cgp.x = 33.0f everything works fine

in clusterCell array I have flattened two dimension array at start I’m saving index for next write operation so at start on clusterCell[0] I have 1 so when writing data to clusterCell I read clusterCell[0] then write data to clusterCell[1] to for example 6 and increment clusterCell[0] to 2, next write is on clusterCell[2] etc.

in system.cu

void domainDecomposition(int3 * test, uint vboPos,  uint2 *particleHash, uint* clusterCell, uint numParticles, uint maxNeighbourhoodParticles)

{

	int numThreads, numBlocks;

	computeGridSize(numParticles, 256, numBlocks, numThreads);

	float4 *pos;

	cuMapVbo(pos, vboPos);

	calcClusterHashD<<< numBlocks, numThreads >>>(pos, particleHash);

	CUT_CHECK_ERROR("Kernel execution failed: calcClusterHash");

	/***/threadSync();

	calcClusterCellNeighbourhood<<<numBlocks, numThreads>>>(test, pos, clusterCell, maxNeighbourhoodParticles);

	CUT_CHECK_ERROR("Kernel execution failed: calcCellNeighbour");

	//printf("%s \n",cudaGetErrorString(cudaGetLastError()));

	cuUnMapVbo(vboPos);

	/***/threadSync();

}

in kernel.cu

__global__ void calcClusterCellNeighbourhood(int3 *test, float4 *pos, uint *clusterCellNeighbourhood, uint maxNeighbourhoodParticles)

{

	/// calculate particle index

	int index  = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

	

	float4 p = pos[index]; /// get particle by index

	int3 gp = calcGridPos(p); /// calculate cell grid position according to particle position

	// current cluster grid hash

	uint cgh = calcClusterGridHash(gp);

	uint currentSaveIDX = 1;

	__shared__ bool lock;// = false;

	const int s = 1;

	for(int z=-s; z<=s; z++)

	for(int y=-s; y<=s; y++)

	for(int x=-s; x<=s; x++)

	{

		uint ncgh = calcClusterGridHash(gp + make_int3(x,y,z));

		if (ncgh != cgh)

		{

			if (ncgh > 0)

			{

				while (lock)

				{

					lock = true;

					currentSaveIDX = clusterCellNeighbourhood[ncgh*(maxNeighbourhoodParticles+1)];

					clusterCellNeighbourhood[currentSaveIDX] = index;

					clusterCellNeighbourhood[ncgh*(maxNeighbourhoodParticles+1)] = currentSaveIDX+1;

					lock = false;

				}

			}

		}

	}

}

in kernel.cui

__device__ int3 calcClusterGridPos(int3 gridPos)

{

	int3 clusterGridPos;

	float3 cgp = float3();

	float3 gpf = make_float3(gridPos);

	cgp.x = gpf.x;// / 50.0f;

	cgp.y = gpf.y;// / 63.0f;

	cgp.z = gpf.z;// / 25.0f;

	clusterGridPos.x = floor(cgp.x);

	clusterGridPos.y = floor(cgp.y);

	clusterGridPos.z = floor(cgp.z);

	

	return clusterGridPos;

}

__device__ uint calcClusterGridHash(int3 gridPos)

{

	int3 clusterGridPos = calcClusterGridPos(gridPos); 

	return __mul24(clusterGridPos.z, par.clusterGridSize_YX)

		+ __mul24(clusterGridPos.y, par.clusterGridSize.x) + clusterGridPos.x;

}

thanks in advance for your help

The error is in the line

float3 gpf = make_float3(gridPos);

as the function’s signature is

float3 make_float3(float x, float y, float z)

I’m surprised the compiler doesn’t nag about it. If you assign constants to cgp, the compiler simply erases the faulty line, as gpf is unused.

BTW:

uint ncgh = calcClusterGridHash(gp + make_int3(x,y,z));

I don’t think CUDA defines any operator on the vector types.

I’m using VS2010 and I can see 7 overrides for make_float3 and there is one which gets int3 as parameter

inline __host__ __device__ float3 make_float3(int3 a)

{

    return make_float3(float(a.x), float(a.y), float(a.z));

}

operators for int3 and make_float3 function works fine for me in other places so I assume that this section is fine, but I’ll try solution without operators and make_float3

Apparently cutil adds those vector-functions to CUDA. So it seems now that calcClusterGridPos doesn’t really do anything worthwhile and calcClusterGridHash is correct assuming par is correctly set up. (BTW __mul24 is slower than normal multiplication if you’re on Fermi.)
So the error must be outside and fixing the indices just hides it. Thus I expect an out-of-bounds array access. You could try adding the condition ncgh < some_nmax as well as check the currentSaveIDX before using it. (Assuming calcClusterCellNeighbourhood is the crashing kernel you actually meant.)

MarkusM I’ve tried to find error using your clues but I still can’t find error, now after commenting out execution of calcClusterCellNeighbourhood<<<numBlocks, numThreads>>>(test, pos, clusterCell, maxNeighbourhoodParticles); I noticed that application crushes at this line: particleClusterHash[index] = make_uint2(clusterGridHash, index);

when it’s commented app works without errors

__global__ void calcClusterHashD(float4 *pos, uint2* particleClusterHash)

{

	// calculate particle index

	int index  = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

	float4 p = pos[index]; /// get particle by index

	int3 gp = calcGridPos(p); /// calculate cell grid position according to particle position

	int3 gcp = calcClusterGridPos(gp); /// calculate cluster grid position with cell grid position --------------> MOVED TO calcClusterGridHash(uint3 gridPos)

	

	/// calculate hash

	uint clusterGridHash = calcClusterGridHash(gp); 

	/// save hash and particle index to array

	particleClusterHash[index] = make_uint2(clusterGridHash, index);

}

when I change from

particleClusterHash[index] = make_uint2(clusterGridHash, index);

to this

make_uint2(clusterGridHash, index);

	particleClusterHash[index] = make_uint2(0, 2);

application works without error

any ideas what is wrong?

maybe there are some tools which I can use to manage these error?