Hi.
In my algorithm, i have the matrix of size: 1000 X 1000.
Now, i am trying to create a kernel function where each thread will map to one (i,j) coordinate in the matrix but im getting an error.
My question is: is there a limit of how many threads or blocks can execute one kernel at a time (im asking because i heard that there is) or is it just the case
that im giving the function too much memory to deal with because it is coping with matrix of size 100 X 100.