Questions about global and local work size

:-) you’re welcome

The irregular matrix size can be solved in 2 ways (I’m not an expert and can’t say which is faster one.)

  • add an if in your kernel code. The condition in if will check whether it is an thread over the edges of the problem and in that case will terminate the thread.
  • add an extra padding along the matrix to make it regular. All threads will do the same amount of work but some output will be useless. (If you go this ways, I would suggest to initialize the padding. I experienced a very slow behavior when the GPU had to deal with floating point exceptions and NANs.)

:-) you’re welcome

The irregular matrix size can be solved in 2 ways (I’m not an expert and can’t say which is faster one.)

  • add an if in your kernel code. The condition in if will check whether it is an thread over the edges of the problem and in that case will terminate the thread.
  • add an extra padding along the matrix to make it regular. All threads will do the same amount of work but some output will be useless. (If you go this ways, I would suggest to initialize the padding. I experienced a very slow behavior when the GPU had to deal with floating point exceptions and NANs.)

thanks again. i really appreciate your quick answers.

thanks again. i really appreciate your quick answers.