multithreaded app parallele programming newby >.<

Electro · March 19, 2008, 1:51pm

Hello,

I wrote already several sequential apps, but, in order to use more efficiently tesla cards, i’d like to write a multithreaded app…

I took a look in the programming guide that goes along with cuda, but it seems that the threads are created by the compiler…

Would it be possible to create threads on the device with pthread library ?
Any idea about what to do ?

Thanks in advance for any help !!!

:)

Sarnath · March 19, 2008, 1:56pm

Electro – multithreading is different from hardware-threading…

Multi-threading -pthreads --etc… are software created threading libraries that run on the CPU.

CUDA – threads are hardware elements – Your program runs inside your graphics card. Your code is the only thing that executes inside GPU. (no OS, no libs) etc…

Got it Mr.Electro??

Electro · March 19, 2008, 2:05pm

So if I understood correctly, the fact that the GPU is using more than one of its proc with what i wrote is transparent to me ?

Sarnath · March 19, 2008, 2:11pm

Your question is NOT transparent to me :-)

You execute your EXE from the CPU. This EXE will copy your GPU kernel part to the GPU device and start execution. The processors inside the GPU will execute your code and return the results back to the CPU from where you print the results… Does this sound clear to you?

Electro · March 19, 2008, 2:25pm

It does !!!

What i want to do is to create an app that runs 128 times the same corner-turn or matrice calculation or FFT. After having written apps that are doing only once one of these operations, i’d like to know how to modifiy/rewrite those apps to stick with my goal.

How to do that seems unclear to me.

That was what i meant…

jordyvaneijk · March 19, 2008, 2:53pm

Hi Electo,

There is a fft library inside CUDA maybe you can take a look at that one… And from what I have experienced, it is very hard to rewrite sequential CPU code in a way it is also fast on the GPU.

That is why when I first started I completely began from scratch.

Good luck

Sarnath · March 19, 2008, 3:30pm

You have to re-write things completely for CUDA.

Meet Mr.CUDA! Forget your existing code :-)

Electro · March 19, 2008, 3:40pm

The apps i wrote are already done with cuda, using the cufft and cublas library…

Forgetting these codes is not a problem, but my problem of having no idea how to optimize cuda applications to be sure they run on the number i want of GPU processors remains, though…

:mellow:

Electro · March 19, 2008, 3:56pm

i forgot to add that the apps i wrote are running on a cuda capable device (i set that device to be the tesla card i have, not the graphic card)

DenisR · March 19, 2008, 4:53pm

You have no control over how many Multiprocessors are used to run your code. They are always all used as long as there are enough blocks requested for your kernel.

Electro · March 19, 2008, 7:43pm

So, in order to reduce the idle of GPU processors, i should create blocks. I see, thanks for the answer !

seibert · March 19, 2008, 11:54pm

Yes, and you should use lots of them. Having more blocks than multiprocessors allows the block scheduler to hide the effects of memory latency.