Here is the big problem:
Each thread does some operations on data received from a pointer ( each thread is given a part of the beginning pointer – same amount for each thread ), the output is different. The output of the threads has less data ( for ex. one has 900k another has 500k and so on ). The problem is that the output pointer should have the data sorted out … thread nr1 should be the first to put data in the shared pointer and so on. ( each tread having a different size the problem is synchronization ).
My idea is was: using the dimension given by the initial block size … I should add junk for each thread until I get the initial size for the data sent to the thread for computing ( this won’t need synchronization ), the problem is that a pointer containing 28MB of needed data and junk should be sorted out for junk … there is a big time cost and also memory cost.
Is there a way to write to the output pointer in a serial way, thread1,thread2 … thread200?