How to count how many times a thread is making an operation if every threads are executing at the same time?

I am really new using CUDA, and I really appreciate your help, because I found an interesting and awsome world, but also huge and dificult to get started.
I have to recode a Labview program to analyse images. In these images appears difraction rings, in the same way they appear when you through a stone in water. The center of the image is centered in the origin of the difraction rings.

What I have to do first, is analyse this rings, i have to create a radial profile, which means i need to find every pixel at the same radius (ring) add all this pixels intensities and after that, find its mean.
This is an issue high paralellizable I think, but I’m finding problems because of my lack of experience :(

I wrote a kernell which each thread access each pixel, finds out its radius (distance from the center using its position in the image 2Darray), and store this value, intensity, in an output array indexed by radius. Something similar to:

intensities[r]= intensities[r]+ image[tid];

What I used to do in my secuential code, was generate two arrays, one with the sum of all intensities for each radius, and the other one with a counter of the number of the values which where summed up for each radius. And divide the inensities to the number of pixels, in order to obtain the mean intensity foreach ring.

So each time I add a value, I increment this counter in other output array indexed as well by the radius:

intensities[r]+= image[tid];

But what I found now, is this counter is always 1, i think because all the threads are accessing at the same time to the image.
So, how can I solve this problem, how could I know how many pixels are being taken into account and be able to find the mean.

Thanks for your help :) :) :)

If this is a single variable for the entire program it is going to be a performance killer. You need to use the atomic add operation to do this as you describe.

Better options would be to have one of those variables for every thread or to design things so each thread only accesses one of the items in the array. I have fought this battle myself and those are the ways I dealt with the problem.