parallel calculate character frequency in a string

Anybody have idea about how to find the frequency of each char in a string?
For example: In aabbcd, I need to figure out frequency of characters: a: 2, b: 2, c: 1, d: 1

My implementation is as below, but it doesn’t work, and there are multiple threads waiting for writing distribution[index], it is not so efficient.
Does anybody have better idea? Thanks a lot

void countblock(unsigned char *d_buf, unsigned int length, unsigned *output )
unsigned tid = threadIdx.x +
blockIdx.x * blockDim.x +
blockIdx.y * blockDim.x * gridDim.x;

unsigned char index = d_buf[tid];

atomicAdd(&distribution[index], 1);

if(tid < 257)
output[tid] = distribution[tid];




you can take a look at this:

I guess you have 3 different approaches:

  1. Atomic on global

  2. Atomic on shared

  3. Sort -> Run Length Coding ( two kernels and no use of atomic)

If you know anything about your data-set you could improve the algorithm.

Is it possible for the string to contain all possible characters? or only alphabetic?

One thing to notice is that the ASCII ‘a’ is not 0, so in your example you need to allocate the memory needed otherwise you will hit outside the array.

In your example “aabbcd” you will get [2,2,1,1] in position [97,98,99,100].

As brano already said, your problem looks a lot like a histogram problem. So you can also take a look at our paper about GPU histogramming available via: You can find an example implementation on our website:

Sort the string, and then do a keyed reduction. Here’s some example code using thrust [1]:

#include <thrust/device_vector.h>

#include <thrust/sort.h>

#include <thrust/reduce.h>

#include <thrust/iterator/constant_iterator.h>

#include <string>

#include <iostream>

int main()


  using namespace thrust;

  std::string h_input = "aabbcd";

// move the input string to the device

  device_vector<char> d_input(h_input.begin(), h_input.end());

// sort the input to bring similar keys together

  sort(d_input.begin(), d_input.end());

// allocate storage for the resulting histogram

  device_vector<char> d_keys(d_input.size());

  device_vector<int>  d_count(d_input.size());

// count the frequency of each character key

  pair<device_vector<char>::iterator, device_vector<int>::iterator> new_end = 

    reduce_by_key(d_input.begin(), d_input.end(),




// eliminate extra unneeded storage in histogram

  d_keys.erase(new_end.first, d_keys.end());

  d_count.erase(new_end.second, d_count.end());

// print the result

  std::cout << "character frequency:" << std::endl;

  for(int i = 0; i < d_keys.size(); ++i)


    std::cout << d_keys[i] << ": " << d_count[i] << std::endl;



Here’s the output:

yucky:~ jared$ nvcc -run

character frequency:

a: 2

b: 2

c: 1

d: 1


Thank you all, your suggestions are helpful :)