{
idx = … // curr_thread
shared int w = 0 ;
if ( idx < 10 ) w = 5 ;
__syncthreads();
parametr_to_show = w ;
}
this code don’t work,
but if I use instead w a parametr from function header - this is work . why is it ?
{
idx = … // curr_thread
shared int w = 0 ;
if ( idx < 10 ) w = 5 ;
__syncthreads();
parametr_to_show = w ;
}
this code don’t work,
but if I use instead w a parametr from function header - this is work . why is it ?
I don’t know what “this code don’t work” means as I don’t know what you expected the code to do, but likely you want this code instead:
{
idx = .. // curr_thread
__shared__ int w = 0 ;
__syncthreads();
if ( idx < 10 ) w = 5 ;
__syncthreads();
parametr_to_show = w ;
}
tera , I mean - that value of w is not changed after this code .
it’s not worked with your code too …
I can’t change shared value w with code witch is like above …
Sorry, I don’t understand what you mean. Can you post a complete, self-contained example together with the results you expect?
global void fck_ ( ) {
idx = .. // curr_thread
__shared__ int w ;
if ( idx < 10 ) { w = 5 ; }
__syncthreads();
parametr_to_show = w ;
}
printf (parametr_to_show) is equals to zero …
but if
global void fck_ ( ) {
idx = .. // curr_thread
__shared__ int w ;
if ( idx < 10 ) { w = 5 ; parametr_to_show = w ; }
}
printf (parametr_to_show) is equals to 5 …
I asked that is why w (parametr_to_show) is equals to 0 (but not 5) in the first case … ???
and how I can to use w if I need it in all threads (not only when idx < 10) ?
I want to load data in the shared memory and reuse it… but values as I described is not changed …
maybe I need to use extern qualificator or something else ?
tera , are you understand is what I’am about ? are you now to how to complite this problem ? or somebody ?
Can you post a complete example, that I could run on my GPU? From just these fragments, even including an ellipsis to mark an omission, it is hard to tell.
Hi There,
since you are expecting or hoping that the global variable parametr_to_show has the value 5, you are making some assumptions about the order of thread execution, i.e you are expecting the last thread to execute will have an idx value of less that 10. You should not make any assumption on thread launch or order of excution of threads after divergent branches.
Check out the webinar called CUDA Warps and Occupancy Considerations on the www.nvidia.com/webinars page
Justin Luitjens talks about this .
Good Luck !!
-Nadeem
are you complite delete function for posts ar leasts ?
What do you want to do with your code?
There are a few potential error causes:
Shared memory variables cannot be declared and initialized at the same time.
Use a single thread to initialize a single shared memory variable with a if condition.
With your code i assume 9 threads write the same value to the same address ?
If the shared memory variable is not dependent on input data or further not modified, then use a constant register or pass it as an argument to kernel.
Maybe your unique thread index calculation is wrong!, possible no thread ever has a index smaller then 10 ?, we cant tell without equation.
There is no __syncthreads() necessary below shared memory declaration.
//declare shared variable
__shared__ int var_s;
//initialize shared variable with a single thread
if(threadIdx.x == 0)
var_s = value;
//sync all threads of this block so the initialization is visible to all threads
__syncthreads();
//use shared memory variable in thread calculation