I have a situation where I need about 10MB of working memory for each thread, and after execution the space can be reused. The number of threads is on the order of 1 million. The size of the array is the same for each thread and known at compile time.
If the amount were only about 1kb, I would just have a fixed-size array in the code and it could go on the stack, but that’s not an option for 10mb.
If the number of threads were much smaller, I would pre-allocate an array with (number of threads)*(10MB), and each thread would have it’s own space, but of course the size of such an array would be several TB so that’s no good either.
What I’m doing now is pre-allocating an array of about 2*(maximum number of resident threads)(10MB), and each thread indexes into a position based on 10MB[(thread index) % (2*(maximum number of resident threads))]. However, I understand that this is a bad idea, since the order that the threads begin and end is not strictly guaranteed, so in principle two active threads might get the same index and corrupt each other’s part of the buffer.
Is there a proper way to solve this problem? Thanks.