I have two matrices (images) A and B, where each pixel of the matrix B can affect several pixels of the matrix A. A is the resulting image.
When I call a kernel, each thread is responsible for carrying out an operation for each pixel of B and this result affects the corresponding pixels of A. The problem is that a pixel of A can receive the contributions of several pixels of B, when this occurs, the contributions do not accrue because when each thread finishes its operation writes the result in the corresponding pixel of A, removing the current value at that pixel (obvious result since all the threads are working in parallel).
The aim is that all the contributions of all the threads are taken into account in the calculation of the matrix A.
I know this is a problem inherent to the nature of the proccess (parallel process), but, I think somehow this can be possible to make, using the shared memory, some criteria of the latency… Does anybody have an idea?
I have two matrices (images) A and B, where each pixel of the matrix B can affect several pixels of the matrix A. A is the resulting image.
When I call a kernel, each thread is responsible for carrying out an operation for each pixel of B and this result affects the corresponding pixels of A. The problem is that a pixel of A can receive the contributions of several pixels of B, when this occurs, the contributions do not accrue because when each thread finishes its operation writes the result in the corresponding pixel of A, removing the current value at that pixel (obvious result since all the threads are working in parallel).
The aim is that all the contributions of all the threads are taken into account in the calculation of the matrix A.
I know this is a problem inherent to the nature of the proccess (parallel process), but, I think somehow this can be possible to make, using the shared memory, some criteria of the latency… Does anybody have an idea?
Or organize your kernel so that you have one thread per pixel of image A.
Do we get a reward for posting these solutions? Otherwise the CUDA Programming and Development forum might be better suited for this kind of questions.