Local memory slowing down program even though it's not being used!

Hi all, I have a strange problem, please observe the code below. This is a real local memory hog, reportedly it uses about 3000 bytes of local memory (from nvcc). If I have two objects in my scene (I’m doing a raytracer if you didn’t guess from my code), the frame rate it 29fps - These objects are specularly shaded so the code completely misses out the bottom half of the code (from the line: “if(numCurrRays > 0)”) and returns the colour. The frame rate is 29fps. If I comment out the code that doesn’t get used (and keep the ray array declarations that I guess are being stored in local memory), the fps rises to 43 fps and nvcc reports that I’m using virtually zero local memory, even though the decarations for the arrays haven’t been commented out - so what’s going on? Any one got an explanation?

  1. Perhaps the local memory only gets reported by nvcc if it gets used?

  2. And if so, does the amount reported depend on how many operations get performed on the memory (even though the code that does this doesn’t get run)?

  3. Why does the fps rise to 43fps even though the code I commented out isn’t getting executed anyway?

Please give your ideas!

[codebox]device RGBColour shade(ShadeRec& originalSR, const RGBColour& backgroundColour)

{

int i = 0;

float3 wo, wi, wt;	

RGBColour fr, ft; 

ShadeRec sr[17];

Ray newRays[16];

Ray currRays[16];

sr[0] = originalSR;

int currentSR = 1, numNewRays = 0, numCurrRays = 0;

if(sr[0].material.y == 1)

	sr[0].colour = specularShade(sr[0]);

else if(sr[0].material.y == 2)

{

	sr[0].colour = specularShade(sr[0]); 

	wo = -sr[0].ray.d;

	fr = reflectiveF(sr[0], wo, wi); 

	currRays[numCurrRays].o = sr[0].hitPoint;

	currRays[numCurrRays].d = wi; 

	currRays[numCurrRays].parent = 0;

	currRays[numCurrRays].transparent = 0;

	numCurrRays++;

}

else

{

	sr[0].colour = specularShade(sr[0]);

	wo = -sr[0].ray.d;

	fr = reflectiveF(sr[0], wo, wi); 

	ft = transparentF(sr[0], wo, wt);

	currRays[numCurrRays].o = sr[0].hitPoint;

	currRays[numCurrRays].d = wi; 

	currRays[numCurrRays].parent = 0;

	currRays[numCurrRays].transparent = 0;

	numCurrRays++;

	if(!tir(sr[0]))

	{

		currRays[numCurrRays].o = sr[0].hitPoint;

		currRays[numCurrRays].d = wt;

		currRays[numCurrRays].parent = 0;

		currRays[numCurrRays].transparent = 1;

		numCurrRays++;

	}

}

if(numCurrRays > 0)

{

	for(i = 0; i < vp.maxDepth; i++)

	{

		for (int currRay = 0; currRay < numCurrRays; currRay++)

		{

			sr[currentSR] = hitObjects(currRays[currRay]);    

			sr[currentSR].ray = currRays[currRay];

			if (sr[currentSR].hitAnObject) 

			{

				if(sr[currentSR].material.y == 1)

					sr[currentSR].colour = specularShade(sr[currentSR]);

				else if(sr[currentSR].material.y == 2)

				{

					sr[currentSR].colour = specularShade(sr[currentSR]);

					wo = -sr[currentSR].ray.d;

					fr = reflectiveF(sr[currentSR], wo, wi); 

					newRays[numNewRays].o = sr[currentSR].hitPoint;

					newRays[numNewRays].d = wi;

					newRays[numNewRays].parent = currentSR;

					newRays[numNewRays].transparent = 0;

					numNewRays++;

				}

				else

				{

					sr[currentSR].colour = specularShade(sr[currentSR]);

					wo = -sr[currentSR].ray.d;

					fr = reflectiveF(sr[currentSR], wo, wi); 

					ft = transparentF(sr[currentSR], wo, wt);

					newRays[numNewRays].o = sr[currentSR].hitPoint;

					newRays[numNewRays].d = wi;

					newRays[numNewRays].parent = currentSR;

					newRays[numNewRays].transparent = 0;

					numNewRays++;

					if(!tir(sr[currentSR]))

					{

						newRays[numNewRays].o = sr[currentSR].hitPoint;

						newRays[numNewRays].d = wt;

						newRays[numNewRays].parent = currentSR;

						newRays[numNewRays].transparent = 1;

						numNewRays++;

					}

				}

			}

			else

				sr[currentSR].colour = backgroundColour;

			currentSR++;

		}

		for (int j = 0; j < numNewRays; j++) 

		{

			currRays[j] = newRays[j];

		}

		numCurrRays = numNewRays;

		numNewRays = 0;

	}

	//Post process ShadeRec tree and get final colour of originalSR

	for(i = currentSR - 1; i > 0; i--)

	{

		if(sr[sr[i].ray.parent].material.y == 2)

		{

			wo = -sr[sr[i].ray.parent].ray.d;

			fr = reflectiveF(sr[sr[i].ray.parent], wo, wi); 

			sr[sr[i].ray.parent].colour += fr * sr[i].colour * dot(sr[sr[i].ray.parent].normal, wi);

		}

		else if(sr[sr[i].ray.parent].material.y == 3)

		{

			wo = -sr[sr[i].ray.parent].ray.d;

			if(sr[i].ray.transparent == 0)

			{

				fr = reflectiveF(sr[sr[i].ray.parent], wo, wi);

				sr[sr[i].ray.parent].colour += fr * sr[i].colour * fabs(dot(sr[sr[i].ray.parent].normal, wi));

			}

			else

			{

				ft = transparentF(sr[sr[i].ray.parent], wo, wt); 

				sr[sr[i].ray.parent].colour += ft * sr[i].colour * fabs(dot(sr[sr[i].ray.parent].normal, wt));

			}

		}

	}

}

return (sr[0].colour);	

}

[/codebox]

Probably newRays is being optimized out when the lower portion of the code is commented out, since it is never referenced.

Somewhere I think I read that arrays can be stored in registers if they are always indexed using constants (or values that the compiler can determine at compile time). So in other words currRays[0] and currRays[1] can be optimized to two registers instead of requiring local memory storage. This would explain the speed boost also.