Problem with atomic functions

I’m learning cuda C and I am trying to solve a problem with my program.

I have a vector with length N and I generate a random number ‘r’ which represents the vector index and in each vector index I should add 1. I want do this with N threads, each thread should generate a random number and add 1 on its index. And This must be done in a synchronized mode for each thread add 1 in a real number no in a past number. So I did this using atomic functions “atomicAdd” but atomic function is much costly. If possible I want to know if is there another method to do this without use atomic functions.

I hope you understand what I want to tell you.
Please help me.

can someone help me?