I have code, multi-threaded, each thread using GPUs:
it loops over:
- get data
- spawn : each thread treat data, running 2 subtasks on GPUs,
It works fine, and I get good performance.
Except that the firsts calls to clGetPlatformIDs returns -1001, the code having an alternative code running on CPUs, it executes the first subtasks on CPUs, which increase the overall elapse time.
Any idea why ?