I’m new on this forum so hello to all.
I’m not even a beginner: We haven’t even started with CUDA yet. So please forgive the naive question.
Also: I haven’t scanned (in parallel ah ah) all the topics of this forum yet so forgive again, please, if the answers have been posted somewhere else already.
We have an application very well adapted for parallelism. For instance - for a representative case - we were 24 seconds to compute without multi-threading, we are less than 2 seconds using multi-threading.
It’s already an archievement. All this on a 4 cores i7. We’ve ordered a 12 cores AMD threadripper beast and we’ll probably go under the second.
But the BIG archievement will be to go real time. For this we need to go under few 1/10th seconds.
Of course we could wait for next beast with 64 cores and more but it’ll be unafordable.
So we’re evaluating CUDA as you can guessed. We haven’t starded yet, just downloaded it.
First of all, the soft already work parallele so the memory is already well separated. There are no mutex, atomic or such: there are not at all shared memory for writing. For reading: yes.
But it’s std::vectors. And the functions create and resize many big std::vectors.
Is it an issue?
Sorry again: it’s probably beginner question.
Secondly the algorithms are not little. It’s not the computing of matrix or something like that. It’s a full real Lib. Is there limitations there?
There are also a lot arithmetic computing on doubles. Limitations there too?
Generally speaking: all info about limitation compared with “normal” C++ codes are welcome and the link to forum threads which already debate of this on the forum are very welcome too.
Thanks a lot,