Stack underflow with NSight

Hi Guys,

this is a probably a newbie question. I have a very simple piece of code:

In the .cuh file, I defined a few functions, one of them is

device void myfunction(float* Args,short* param)

{

Args=&somearray[0];

}

I defined as well a variable myfloatArray as

device float* myfloatArray;

my kernel is in this format :

extern “C” global void myKernel(float* arrayfloat)

{

[indent]int num_thread=blockIdx.x * blockDim.x + threadIdx.x;

if (num_thread==0)

{

	[indent]myfloatArray=arrayfloat;

	short param=1;

	myfunction(myfloatArray,&param);[/indent]

}[/indent]

}

I just limited num_thread==0 so that only 1 trade goes through the code to debug it quickly with Nsight. The reason why I am

using a variable myfloatArray is to make the pointer arrayfloat available globally to all the device functions I want to

write and use in my kernel without passing the pointer to these functions ( maybe not the smartest solution but it seems to work).

My problem is when I debug and step into myfunction, the debugger does not even enter in the function that already on the first line it throws

this error :

Parallel Nsight Debug

Detected data stack underflow on 1 threads. First thread:

blockIdx = {0,0,0}

threadIdx = {0,0,0}

StackPointer = 0x00ffae00

StackLimit = 0x00fff840

PC = 0x0010ada8

FunctionRelativePC = 0x00000aa8

Would you have any idea of what I did wrong ?

I don’t know about the the stack underflow. However the compiler has most likely completely optimized away myfunction(), as it has no effect.

Well that was not my intention, I though myfunction would make Args pointing to a new area of memory, so it should have an effect…

Since C (and CUDA) uses “call by value” semantics, myfunction() has a local copy of its arguments, which can be changed at will without influencing the caller.

You can however pass a poiter to the value you want to change:

__device__ void myfunction(float** Args,short* param)

{

    *Args=&somearray[0];

}

extern "C" __global__ void myKernel(float* arrayfloat)

{

    int num_thread=blockIdx.x * blockDim.x + threadIdx.x;

    if (num_thread==0)

    {

        myfloatArray=arrayfloat;

        short param=1;

        myfunction(&myfloatArray,&param);

    }

}

Thx a lot, I just realized at the moment that I was passing a copy of the pointer… as I said newbie :)