//CUDA Code
if (comm < d_g.nodes) {
for (int node = d_community_pos[comm]; node < d_community_pos[comm + 1]; node++) {
for (int neighbor = d_g.out_col[d_community_list[node]];
neighbor < d_g.out_col[d_community_list[node] + 1];
neighbor++) {
int n_neig = d_g.child_out[neighbor];
int neig_comm = d_p.node_comm[n_neig];
d_count[d_fake_ids[neig_comm]] += 1;
}
}
}
Each thread (i) in this code should create a new copy of d_count and that should not be visible to or should not be modified by other threads. but in my case all the 5 threads modify the same copy and keep adding on exiting values, how to solve this problem ?
I tried declaring d_count in local memory and it worked but on the small data, when i tested it on big data it does not work because there’s a limit on local memory usage we cannot use more than 512KB memory for each thread,
I tried to make d_count zero for each i, it also did not work.
Any suggestions how to make d_count a private array for each thread withou using local memory ?
any cuda experts please