I need some help.
I want to break a matrix in chunks, after, I proccess each chunk in a different stream. I don’t know the width of this array, I read a file and then I get the width.
When I compile the code I get the message: “error: expression must have a constant value”
I can define a lot of streams and use only the amount I need but is not fair.
Is there any way to do this?
//Open the file.
//Get the width of the matrix and save it in the variable k.
cudaStream_t stream[k]; //
for (int i = 0; i < k; ++i)
Other thing I want to consult is techniques to reduce the amount of registers used.
I use around 47 registers per thread and is crazy. I only have defined 10 variables in my kernel,
only two of them are floats, the rest are unsigned short.
Thanks in advance.