Getting "too many requested resources to launch"

Hi there!

I’m a new comer to CUDA, and i’m trying to program a code related to isosurface rendering. In this program I call the following function:

device bool searchValue2(tGPUMatryoshka gpuMatryoshka, int searchPosVal1, int val2)

{

int beginPos, searchPos, lastPos;

beginPos = gpuMatryoshka.cols_value2_index[searchPosVal1];

if(searchPosVal1!=(gpuMatryoshka.nCols-1))

	lastPos = gpuMatryoshka.cols_value2_index[searchPosVal1+1];

else lastPos = gpuMatryoshka.nActiveCells - 1;



bool foundElem, foundPos;

foundElem = foundPos = false;

foundElem = val2<10;



if(lastPos<beginPos)

	return false;

foundElem = (gpuMatryoshka.cols_value2[beginPos]==val2);

foundPos = ((lastPos-beginPos)<=2);

if(beginPos!=lastPos)

    foundElem = foundElem || (gpuMatryoshka.cols_value2[lastPos]==val2);

if(gpuMatryoshka.cols_value2[beginPos]<val2 && !foundElem && !foundPos)

    while(true)

    {

        searchPos = (beginPos+lastPos)/2;

        foundElem = (gpuMatryoshka.cols_value2[beginPos]==val2) || (gpuMatryoshka.cols_value2[searchPos]==val2) || (gpuMatryoshka.cols_value2[lastPos]==val2);

        foundPos = (beginPos==lastPos) || (beginPos+1==lastPos);

        if(foundElem || foundPos)

            break;

        if(gpuMatryoshka.cols_value2[searchPos]<val2)

        {

            beginPos = searchPos;

        }

        else

        {

            lastPos = searchPos;

        }

    }

return foundElem;

}

If i comment the code in red the function doesn’t give me any problem, but if i try to run the program with this code there, the program send a erro that reads: “too many requested resources to launch”. The program runs in emurelease mode and execute correctly, however i think this happens because in emurelease mode the pc execute the code in CPU, and don’t simulate the GPU threads.

When I try to search the problem on google I found out that the this error is related to registers or shared memory. In this code I don’t make use of shared memory, this only leave me 1 option to explore: Register overusage. Is there some one that could teach me a way to calculate the registers used by my GPU kernel? Does anyone know why this code probably don’t run correctly in GPU?

Thanks alot in advance.

it’s most likely too many registers. just reduce the numer of threads per block.
you can see the register usage of your kernel when compiling with --ptxas-options=-v or in the .cubin file.

Can you please tell me the directory where Visual Studio create the .cubin file?

Thanks for your answer… :)

You can get the .cubin file in your project folder where your source file exists.

Thanks n Regards,
Sadhana