shared data as argument of a template ?

hassaneo · April 3, 2009, 2:35pm

Hello !

I would like to set a :

  __shared__ int vector[VECTOR_SIZE]

declared in a global function and setted by :

  setVector<VECTOR_SIZE-1>( sharedVector )

through a device recursive template funtion defined like this :

 template<int i> __device__ int setVector( int* sharedVector )
 {
     sharedVector[i] = i+1;
     return setVector<i-1>( sharedVector );
 }

 template<> __device__ int setVector<-1>( int* sharedVector )
 {
     return 0;
 }

I have read several “old” topics about problems like this but I didn’t find solution (even with volatile).

If someone can help me to solve it, it will be very nice :rolleyes:

SPWorley · April 3, 2009, 3:31pm

Why not just use a loop and skip the fancy templates?
Efficiency will be the same since CUDA will unroll the loop for sure, since VECTOR_SIZE is known at compile time.

BTW there’s nothing wrong with using templates (I use metaprogramming in CUDA for a couple applications) but here it just feels like you’re complicating a simple initialization unnecessarily.

Noel_Lopes · April 3, 2009, 3:47pm

Not sure, since I’m a newbie, but I think your problem happens because you are using recursion and maybe the compiler cannot handle it, since it tries to inline every call to a __device function. You can try to use noinline to check whether this is true or not.

hassaneo · April 3, 2009, 3:54pm

Hi SPWorley and thank you for your reply ;)

My problem is in reality more complicated, I have just simplified the situation with something that underline my question. So I know that for this kind of situation I could use a simple loop but in fact I would like to use my shared data as an argument of another function at least for a better understanding of Cuda and at most for reducing processing time. Thank you for your help !

hassaneo · April 3, 2009, 4:11pm

Hi Noel Lopes, my recursive function works very well with other type of arguments because of a low level of recursivity (VECTOR_SIZE 10 for example).

And when I test a simple function like one which set a shared data like :

void setToi( int *sharedVector, int index, int a )
{sharedVector[index] = a;}

It works very well, so it could be about the recusivity but how to resolve it ?

Noel_Lopes · April 3, 2009, 8:23pm

Can you try this (I don’t have cuda installed on this computer to test myself):

template device int setVector( int* sharedVector )
{
if (i ==0) {
return 0;
}
if (i >0) {
sharedVector[i] = i+1;
return setVector( sharedVector );
}
}

Let me know if it works.

hassaneo · April 4, 2009, 12:24am

I did what you ask me in a correct main.cu (already tested) and when I was running the compilation (I am under Windows and Visual Studio 2005) I get this :

1>------ Build started: Project: Application_exe, Configuration: Release Win32 ------
1>Compiling…
1>main.cu

and no more, the compilation runs but there is no end.

I tried this :

shared int m[500]

setVector<500>( m );
setVector<15>( m );
setVector<1>( m );

No results :wacko:

Noel_Lopes · April 4, 2009, 11:11am

Actually the correct code should be (sorry about the initial code):

template device int setVector( int* sharedVector )
{
if (i < 0) {
return 0;
}
if (i >=0) {
sharedVector[i] = i+1;
return setVector( sharedVector );
}
}

but I don’t think this would solve it (you can try it however). Another thing you can do is this :

template device int setVector( int* sharedVector)
{
if (i < 0) {
return 0;
}
if (i >=0) {
sharedVector[i] = i+1;
return setVectorAux( sharedVector );
}
}

template device int setVectorAux( int* sharedVector)
{
if (i < 0) {
return 0;
}
if (i >=0) {
sharedVector[i] = i+1;
return setVector( sharedVector );
}
}

And thats it (I’m out of ideas). Hope this works.

hassaneo · April 6, 2009, 9:52am

Hi,

I test all what you ask me but it doesn’t work very well because sometimes the compiler tell me that there is not always a return (in fact your return inside if) or simpler the compiler never finish its work so…

However I did something that work :

template device uchar setSmem( uchar* sMat, int tx, int ty )
{return sMat[i] = tx + ty + setSmem( sMat, tx, ty );}

template<> device uchar setSmem<-1>( uchar* sMat, int tx, int ty )
{return 0;}

And I call it with this :

#define matsize 50

shared uchar sharedMat[matsize];
__syncthreads();
RI[ index ] = setSmem( sharedMat, tx, ty );

That’s work perfectly until matsize is less than 779.

In fact when matsize is 779, compiler tell me :
1>nvopencc ERROR: C:\Program Files\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819

and even if my code can be executed, results are those I got with my last good value of matsize (still in memory).

Does anybody have a theorical explanantion of this limitation ?

Thank you for everything Noel Lopes ! External Image

Topic		Replies	Views
shared memory as a parameter of function CUDA Programming and Performance	7	1883	June 29, 2015
templates/recursion problem CUDA Programming and Performance	9	3332	January 19, 2008
Passing C++ templates to CUDA How to pass compile-time constants from C++ to CUDA CUDA Programming and Performance	4	3658	June 1, 2009
shared mem function argument optimization CUDA Programming and Performance	3	3542	July 21, 2008
cuda and partial specialization template CUDA Programming and Performance	0	2583	March 22, 2012
Templated arguments / shared memory CUDA Programming and Performance	8	2248	September 8, 2008
Template function set cudaFuncAttributeMaxDynamicSharedMemorySize error CUDA Programming and Performance	4	549	February 19, 2024
CUDA non-global functions Shared memory declaration, pointer usage CUDA Programming and Performance	2	4098	March 14, 2008
Ok, what am I doing wrong here? Some kind of overwrite? CUDA Programming and Performance	32	14806	February 7, 2008
shared memory array can size be the same as blocksize by default? CUDA Programming and Performance	8	3458	March 6, 2011

__shared__ data as argument of a template ?

Related topics

shared data as argument of a template ?