i have a question about the access of any thread in a block to the global memory. I think to know that any thread can read and write at any location of an an array in global memory. But i can not find what take place if two or more threads try to access to the same array location to the same time!?
The memory in the GPU is a bit like memory in the x86s,
and their local caches. GPUs have their own local memory area
with dedicated access, and there is global ‘device’ memory that can be
copied to and from by both the host and each of the GPU processors.
When multiple devices access the same device memory location, the hardware
makes sure that all get access, though the order of access is arbitrary.
Problems that can occur because one processor writes before another
reads, is the classic parallel programming problem, that you design
your program to avoid.
I hope I answered your question.
Yes you did. Thank you!
May be you can help by another problem:
I want to work with different subroutines in my code. But i have a problem to pass more than 4 arguments from the global to the device subroutines. That is probably not enough. Do you know any other solution to exchange data between a global and a device subroutine?
But i have a problem to pass more than 4 arguments from the global to the device subroutines. That is probably not enough. Do you know any other solution to exchange data between a global and a device subroutine?
While there is a limit as to the number of arguments that can be passed, you should be able to pass more than 4. Given your previous posts, I believe that you’re encountering a different compiler error having to do with how arguments were being passed rather then a fixed limit. Note the actual limit is based on the size of the available space in constant memory.
Do you know any other solution to exchange data between a global and a device subroutine?
You can use shared global memory (i.e. declare your variables in your modules definition section).
Hope this helps,