concurrent float read/writes

Hello everyone!

I am CUDA beginner, it’s a question about basics, I’d appreciate any help. I’m trying to use concurrent ant colony optimization for solving TSP.

I got a graph with n cities and n^2 of edge weights. Matrix of edge weights is kept in global memory. I run a kernel of size m ants (grid may contain one or many blocks if it would help you). Every ant has generated a tour and now it’s time to update pheromone matrix P[n][n] (like an edge matrix, it’s allocated in global device memory). Pheromone value for every weight is a float number. Every ant, as a different thread, needs to peform addition on some value of P.

Now, what to do to perform synchronous addition by many ants on some value P[i][j]? If i only could, i’d use a float version of atomicAdd(), but obviously there isn’t such a function.

I’ve found some suggestion on the forum for atomicAdd on floats but it’s a very expensive workaround. Would you help me to point elegant and fast solution? I’ve read Programming Guide, but i don’t feel which of the tools leads to solution the simplest way.


The weights are linear. So use fixed point representation… all you need is the fire and forget integer atomics for those.