I am CUDA beginner, it’s a question about basics, I’d appreciate any help. I’m trying to use concurrent ant colony optimization for solving TSP.
I got a graph with n cities and n^2 of edge weights. Matrix of edge weights is kept in global memory. I run a kernel of size m ants (grid may contain one or many blocks if it would help you). Every ant has generated a tour and now it’s time to update pheromone matrix P[n][n] (like an edge matrix, it’s allocated in global device memory). Pheromone value for every weight is a float number. Every ant, as a different thread, needs to peform addition on some value of P.
Now, what to do to perform synchronous addition by many ants on some value P[i][j]? If i only could, i’d use a float version of atomicAdd(), but obviously there isn’t such a function.
I’ve found some suggestion on the forum for atomicAdd on floats but it’s a very expensive workaround. Would you help me to point elegant and fast solution? I’ve read Programming Guide, but i don’t feel which of the tools leads to solution the simplest way.