i’m a french student and currently doing an internship.
My job is to modified a program (already written in C++) to make it work on CUDA.
I have a Quadro 4000, and I don’t know a thing about CUDA (though i tried to read the programming guide, it’s still obscur).
For this specific GPU, there are 256 Cores, and 1024 threads per blocks, and each block has 48 Kb shared memory, right ?
What is the link between cores and blocks ? or cores and multiprocessor ?
Is it : 8 MP, and 32 cores/MP = 256 cores ?
And with 8 blocks per MP i have 64 blocks of 1024 threads ?
And what’s the difference between threads and resident treads ? Because if i have 1024 threads per block max, it must be 8*1024 > 8000 per MP, and not 1536 ?
I have to wonder how i can cut 3 loops on many many images for the program to be as fast as possible … (currently taking several hours).
Thank you, and i hope you can understand my English ^^"