I am attempting to implement a Fuzzy C Means algorithm on the GPU.
During the cluster membership update phase, the CPU reference implementation checks to see if any of the distances for each feature in a sample are smaller than some epsilon. If so, then the distance value for that particular feature is set to 1, while the distance values for the remaining features are set to 0. This avoids divide by 0 problems in the final cluster membership update.
I’ve seen example GPU code that works around this problem by adding a small constant (0.001) to each distance during the distance calculation phase, since the threads in a block cannot communicate with each other.
Unfortunately, implementing this work around would mean that I would need to change the CPU reference implementation. Or else, live with larger than preferred errors between the GPU and CPU.
Has anybody else encountered a similar issue? If so, how did you solve it? Any opinions?