Thread Utilization over Thread InstructionLoading

I will post a nice article here on the calculation of a Function (which is called in my PhD Protrusion Function) that uses the Dijkstra algorithm presenting two cases. It seems that it is better to underutilize the threads than to load them with instructions especially with a for loop.
It will be a PDF attachment and a code attachment (how to compile it should be regarded obvious though I will incude a solution in MS VC and some hints on how to compile it) I can see with joy that this Forum allows attachments. Great.
This article will be open for discussion.

Cheers,
Alexander.

You can check the CUDA usage of the above in the Surface Segmentation algorithm I have created and you can see also how Fermi goes.

Mesh Segmentation Algorithm (virus free 100%)

It is 64-bit executable guarantee to run smoothly in windows 7.

unrar it and go into the ProtBasedSeg64 directory then:

write in the command prompt : protbasedseg.exe ./armadillo.off
You can visualize the results put in the directory ./Parts which is named armadillo.wrl using this nice vrml viewer : Vrmlview

In my system (Intel I7-920, 12GB DDR3, GTX-275) it takes 30 secs to execute with 9 secs GPU execution time, I am wondering with a Fermi card what will be the speedup. Please note your card needs 600MB to run the program,

Also if you have a machine close to mine you can experiment with this huge model : Armadillo 1M points

In a PDF which will follow by the end of last week I will post the problems with the Dijkstra Algorithm when it is implemented in the GPU with the question : How can we make it faster?

In general I see Forums as a place to dynamically exchange ideas and creativity and this Forum seems quite serious in order to procede with this.

Best,
Alexander.