My scenario is to run just one thread or one kernel on each core.
The mean cause is that i am not running a real parallel operation, but my operation is taking too long ( about one day), and my goal is to get the maximum number of results in each runtime. ie: i need to run this ‘taking long’ operation just one per core, because this operation contains a huge while loop, an it takes on a normal CPU the maximum of one CPU Core.
for example: if my GPU contains 48 cores, how can i run my kernel just 48 times on each one must run on a different core ??