Somtimes right, sometimes wrong... Forward Mapping

Hi!

I have a serious problem with my project.

I am implementing a forward mapping algorithm with CUDA.

I don’t think my code is wrong but sometimes I get wrong results.

Clearly, I ran the same code, but different results came out.

Here is my core code in my program.

//<<------------------------- Host code --------------------------->>

...

dim3 dimBlock(30,20,1);

dim3 dimGrid(xWidth / dimBlock.x, yHeight / dimBlock.y, 1);

for(i=0;i<TOTALVIEW-2;i++)

{

	InitializeVirtualViewKernel<<< dimGrid, dimBlock, 0 >>>(d_OutView[i].m_pImageData,d_OutView[i].m_pDepthMap,xWidth,xWidthStep);

}

	

for(i=0;i<TOTALVIEW-2;i++)

{

	ForwardMappingKernel<<< dimGrid, dimBlock, 0 >>> (d_OutView[i].m_pInputCoef,d_pInputInfo,d_OutView[i].m_pImageData,d_OutView[i].m_pDepthMap,xWidth,yHeight,xWidthStep);

}

...

//<<------------------------- device code --------------------------->>

texture<float4, 2, cudaReadModeElementType> texCenterView;

__global__ void

InitializeVirtualViewKernel(unsigned char* d_pOutImage, unsigned char* d_pOutDepth, int xWidth, int xWidthStep)

{

	unsigned int iX = blockIdx.x * blockDim.x + threadIdx.x;

	unsigned int iY = blockIdx.y * blockDim.y + threadIdx.y;

	d_pOutDepth[iY*xWidth + iX] = 0;

	d_pOutImage[iY*xWidthStep + 3*iX + 0] = 0;

	d_pOutImage[iY*xWidthStep + 3*iX + 1] = 0;

	d_pOutImage[iY*xWidthStep + 3*iX + 2] = 0;

}

__global__ void

ForwardMappingKernel(MapCoef* d_pInputCoef, ViewInfo* d_pInputInfo, unsigned char* d_pOutImage, unsigned char* d_pOutDepth, int xWidth, int yHeight, int xWidthStep)

{

	unsigned char cDisparity;

	unsigned char vDisparity;

	float zDepth;

	unsigned int nX, nY;

	unsigned char R,G,B;

	unsigned int iX = blockIdx.x * blockDim.x + threadIdx.x;

	unsigned int iY = blockIdx.y * blockDim.y + threadIdx.y;

	

	float fU = iX / (float) xWidth;

	float fV = iY / (float) yHeight;

	float4 texData = tex2D(texCenterView, fU, fV);

	

	B = (unsigned char) texData.x; // Blue

	G = (unsigned char) texData.y; // Green

	R = (unsigned char) texData.z; // Red

	cDisparity = (unsigned char) texData.w; // Depth Disparity

	zDepth = d_pInputInfo->Scale_D2Z * (cDisparity * d_pInputInfo->SSrl);

	float fX = (float) iX;

	float fY = (float) iY;

	float t = d_pInputCoef->a9*fX + d_pInputCoef->a10*fY + d_pInputCoef->a11*zDepth + d_pInputCoef->a12;

 Â Â Â 

	float xCoor = (d_pInputCoef->a1*fX + d_pInputCoef->a2*fY + d_pInputCoef->a3*zDepth + d_pInputCoef->a4) / t + d_pInputCoef->a13;

	float yCoor = (d_pInputCoef->a5*fX + d_pInputCoef->a6*fY + d_pInputCoef->a7*zDepth + d_pInputCoef->a8) / t + d_pInputCoef->a14;

	if(xCoor < 0 || xCoor > xWidth-1 || yCoor < 0 || Â yCoor > yHeight-1)

 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return;

	nX = (unsigned int) (xCoor + 0.5f);

	nY = (unsigned int) (yCoor + 0.5f);

	

	vDisparity = ((unsigned char*) d_pOutDepth)[nY*xWidth + nX];

	if(cDisparity > vDisparity)

	{

 Â Â Â d_pOutDepth[nY*xWidth + nX] = cDisparity;

	}

	d_pOutImage[nY*xWidthStep + 3*nX + 0] = B; // Blue

	d_pOutImage[nY*xWidthStep + 3*nX + 1] = G; // Green

	d_pOutImage[nY*xWidthStep + 3*nX + 2] = R; // Red

	

}

Two different result images are attached.

one is right, and the other is wrong.

what am I supposed to do for fixing my problem?

please help me…

Thank you.

////////////////////////////////////////////////////////////

  • 32-bit Windows XP

  • CUDA toolkit release version 1.1

  • Visual Studio 2005

  • Geforce 8800GTS 512


    !(upload://m21Gvb4z9yjLFkswYfuZApdLh1t.jpeg)

The maximum number of threads per block is 512 on G80, but you seem to be using 30 * 20 = 600.

It’s good practice to check for errors after executing your kernel to catch these kind of errors.

Thanks a lot!

I fix it.

But I don’t understand why two different results come out…

Probably because old, irrelevant (allocated in the past) values were returned from that memory block.

Good luck

Regards,