You can’t arbitrarily define the grid and block size. The sizes you need will be defined by the work per thread that the kernel performs and the total amount of work done. Have a look at the SDK matrix multiplication sample for a good example of how to do this. It is also discussed in quite a lot of detail in the programming guide.
You can’t arbitrarily define the grid and block size. The sizes you need will be defined by the work per thread that the kernel performs and the total amount of work done. Have a look at the SDK matrix multiplication sample for a good example of how to do this. It is also discussed in quite a lot of detail in the programming guide.