Hi all,
As a newbee in GPGPU world, I’d like to validate my understanding with you.
My goal is to maximise parallell computation.
I have a GeForce GTX 850M. Thanks to the deviceQuery source example,I have 5 Multiprocessors running 128 Cuda Cores each so 640 cuda cores.
My understanding,
For a given instruction, 640 threads can execute it at the same time.
Correct ?
As the warp is the minimum group of threads, I can run (640/32) 20 blocks of 32 threads at the same time. Then I can run 20 streams of 32 thread at the same time
Correct ?
As streams can run concurrently, I can have 20 streams running 20 different kernels (1 kernel / steam) at the same time.
Correct ?
Thnaks you all in advance.
Regards