have problems with shared memory

itmanager85 · October 23, 2011, 12:04pm

{
idx = … // curr_thread

shared int w = 0 ;

if ( idx < 10 ) w = 5 ;

__syncthreads();

parametr_to_show = w ;

}

this code don’t work,

but if I use instead w a parametr from function header - this is work . why is it ?

tera · October 23, 2011, 12:11pm

I don’t know what “this code don’t work” means as I don’t know what you expected the code to do, but likely you want this code instead:

{

    idx = .. // curr_thread

    __shared__ int w = 0 ;

    __syncthreads();

    if ( idx < 10 ) w = 5 ; 

    __syncthreads();

    parametr_to_show = w ; 

}

itmanager85 · October 23, 2011, 4:00pm

tera , I mean - that value of w is not changed after this code .

it’s not worked with your code too …

I can’t change shared value w with code witch is like above …

tera · October 23, 2011, 5:58pm

Sorry, I don’t understand what you mean. Can you post a complete, self-contained example together with the results you expect?

itmanager85 · October 24, 2011, 2:56am

global void fck_ ( ) {

idx = .. // curr_thread

__shared__ int w ;

if ( idx < 10 ) { w = 5 ; }

__syncthreads();

parametr_to_show = w ;

}

printf (parametr_to_show) is equals to zero …

but if

global void fck_ ( ) {

idx = .. // curr_thread

__shared__ int w ;

if ( idx < 10 ) { w = 5 ; parametr_to_show = w ; }

}

printf (parametr_to_show) is equals to 5 …

I asked that is why w (parametr_to_show) is equals to 0 (but not 5) in the first case … ???

and how I can to use w if I need it in all threads (not only when idx < 10) ?

I want to load data in the shared memory and reuse it… but values as I described is not changed …

maybe I need to use extern qualificator or something else ?

itmanager85 · October 24, 2011, 9:44pm

tera , are you understand is what I’am about ? are you now to how to complite this problem ? or somebody ?

tera · October 24, 2011, 10:09pm

Can you post a complete example, that I could run on my GPU? From just these fragments, even including an ellipsis to mark an omission, it is hard to tell.

nadeemm · October 24, 2011, 10:31pm

Hi There,
since you are expecting or hoping that the global variable parametr_to_show has the value 5, you are making some assumptions about the order of thread execution, i.e you are expecting the last thread to execute will have an idx value of less that 10. You should not make any assumption on thread launch or order of excution of threads after divergent branches.

Check out the webinar called CUDA Warps and Occupancy Considerations on the www.nvidia.com/webinars page
Justin Luitjens talks about this .

Good Luck !!
-Nadeem

itmanager85 · October 25, 2011, 2:47am

are you complite delete function for posts ar leasts ?

djmj1000 · October 26, 2011, 3:57pm

What do you want to do with your code?

There are a few potential error causes:

Shared memory variables cannot be declared and initialized at the same time.

Use a single thread to initialize a single shared memory variable with a if condition.

With your code i assume 9 threads write the same value to the same address ?

If the shared memory variable is not dependent on input data or further not modified, then use a constant register or pass it as an argument to kernel.

Maybe your unique thread index calculation is wrong!, possible no thread ever has a index smaller then 10 ?, we cant tell without equation.

There is no __syncthreads() necessary below shared memory declaration.

//declare shared variable

__shared__ int var_s;

//initialize shared variable with a single thread

if(threadIdx.x == 0)

    var_s = value;

//sync all threads of this block so the initialization is visible to all threads

__syncthreads();

//use shared memory variable in thread calculation