Sort-of MPI on the Tesla Development of high-level routines

When I first used an 8-processor Silicon Graphics server (don’t ask the price) some
20 years ago, I was the first one in the lab to have parallel applications up and running
thanks to the C$DOACROSS directive (yep, in Fortran!). When Beowulf clusters became
affordable, some 10 years ago, I was not very happy when I took a look at MPI (message
passing interface) programming, but after gaining some experience I noticed that I was
always using the same communication structures, so I created my high-level library on
top of MPI (google: SPMDlib) and, keeping in mind C$DOACROSS and its current standard
OpenMP my own directives (google: SPMDdir). By the way, the parser developed still
needs work and the project is dead thanks to multi-core processors.
So now I have two options: stick to multi-core CPUs with OpenMP or invest a lot of time
in developing higher-level CUDA routines with a limited selection of functionality.
Porting individual applications is not an issue when doing research in image processing.
Are there any people with a similar interest???
Hans

dubuf,

I am working on hardware loop unrollers (semantic with OpenMP) for mapping to a High level paradigm in programming with the GPU. What do you exactly have in mind?