I’m quite new with the CUDA programming and I have a question. Is it possible to have an identifier of the current thread running on the device. What would be best for me, is that this identifier would be unique across blocks (but not needed across devices) and would be between 0 and the maximum number of threads that could be run on the device.
The reason for this is the following:
I’m trying to transform a sequential algorithm on CUDA. In this algorithm, I need to update counters. What I would like to do, is that each thread have its own counter. At the end, I would just need to compute the sum of the counters and everything would be fine. It means that if I have k counters, I would create a matrix containing k*nbthreads counters. The only problem is that I need a lot of counters (~60000) and the number of threads can be quite important also (much more than the maximum of threads allowed on the device). This is why I was thinking that if I could make a mapping between each thread and an identifier between 0 and the maximum number of threads, my matrix would be much smaller and could fit on the memory of the device.
Do you know how I can have such identifier? Or do you have any hint that would avoid using such identifier?
Thanks a lot,