It is not necessary to set a SETMAP of frequency of items, multiple threads in several blocks.

1st Try = I tried to create a critical area, which did not work by locking the code “Apparently if there are more than 3 SM being used it creates a deadlock in the critical area”.

https://stackoverflow.com/questions/18963293/cuda-atomics-change-flag/18968893#18968893

Exemplo: 1 2 4 2 5 6 2 1 3 5 6
Frequência
1=2
2=3
3=1
4=1
5=2
6=2

Problem = Logical and easy, a blank <String ID, int value> structure is started if the element does not exist and added, if its support exists and added with the support of the element found. The problem starts when everyone tries to read the first empty position and all threads write their element in the first position of the list, as said I could not use a critical area, I believe I can create something with some special function from the atomic functions like atomicCAS .

The source code I want to create is this:

auto indexAtual = blockIdx.x * blockDim.x + threadIdx.x;
	    if (indexAtual < x ) {
	            int index = 0;
	            bool newFlag = true;
	            while (newFlag) {
	                if (0 == compare(new_canditatos_cont_suporte[index]->ItemId, "")) {
	                	(*new_canditatos_cont_suporte_size)++;
	                	my_strcpy(new_canditatos_cont_suporte[index]->ItemId,new_canditatos[indexAtual]->ItemId);
	                	new_canditatos_cont_suporte[index]->indexArrayMap = new_canditatos[indexAtual]->indexArrayMap;
	                	new_canditatos_cont_suporte[index]->suporte = new_canditatos[indexAtual]->suporte;
	                    newFlag = false;
	                } else if (0 == compare(new_canditatos_cont_suporte[index]->ItemId, new_canditatos[indexAtual]->ItemId)) {
	                	new_canditatos_cont_suporte[index]->suporte += new_canditatos[indexAtual]->suporte;
	                    newFlag = false;
	                } else {
	                    index++;
	                }
	            }
	        }

Use cooperative groups, specifically a cooperative kernel launch. You can then do a grid-wide sync which will be an essential component of any grid-wide critical section.

However, as you mention, what you want to do may be easier to solve with careful use of atomics. I’m not entirely sure of your description, but maybe what you are after is a thread-safe vector push_back. In that case, something like this may work for you:

https://stackoverflow.com/questions/21786495/cuda-kernel-returning-vectors/21788662#21788662

add in a push_back and quiet. The problem is in the difference between strings, they can not happen with competition.

0 == compare (new_canditatos_cont_suporte [index] -> ItemId, “”)

The difference in strings may come about because there is no prescribed order for CUDA thread execution, whereas your serial code always follows the same path.