thanks for the reply. I intentionally do not want to use the strength of the CPU. I rather have to “bennchmark” the individual ALUs on a GPU regarding any potential computing latencies among them. Thus, I want to assign a (similar) set of instructions to several specific ALUs, measure the time needed for executing this set of instructions and compare the results if there are any differences.
In general I want to check a GPU for certain sources of race conditions. The first I thought of is a potential, minuscule difference in the execution speed of the different ALUs. Maybe you guys know of other potential sources of race conditions.
However, since my goal is rather diametrical to the typical use of a GPU (parllelization, etc.) for me it is rather difficult to see how I can access an inidividual ALU on a low-level with the common tools.