Open Source CUDA Libraries Libaries for common calculations

Would anyone here be interested in creating a CUDA open-source library project? I’ve got tons of ideas for stuff to program, more than I’ll ever have time to do on my own, and I’d like to be able to give something back to the CUDA community for all the help they’ve given me.

So, I’m thinking of starting an open-source CUDA library that will handle common tasks for developers that want to speed up their applications with CUDA, but don’t want to have to learn the specifics of CUDA programming themselves. I’ve drawn up a rough outline for things I’d like to put into the library, which I’ll be glad to show anyone who is interested in helping out.

I figure I’ll register a new project on SourceForge once I get enough of the functions working to be worthwhile.

Anyone else interested? PM me if you want more specific info on the project, or if you’re interested in contributing code. Though I’m aiming to keep this library somewhat generic, if you know of some specialty algorithms in your field, lets add those too!

maybe you can try to contribute to the CUDPP project? That is such a library (although a bit underexposed I am afraid). Other than that I would be willing to contribute to such a project.

CUDPP looks neat, but it is still somewhat low-level. Considering the amount of time it has taken me even to learn the basics of CUDA, I would like to create a library so that any programmer can invoke CUDA functionality with ease.

For example (C# pseudo-code):

float myArray = new float{1.23871, 1.89771, 8.2971981, …};

float minElement = CUDALib.Min(myArray);

Underneath, the library can be as complex/optimized as possible, but the invoking programmer won’t have to know anything about it. Though this only allows them to use CUDA for a small subset of their programming tasks (i.e. instead of writing a custom kernel for their specific project), they could easily use it for small, common, CPU-intensive tasks such as sorts, reductions, parallel mathematical operations, etc.

I also think doing a library like this could expose CUDA to a much wider audience, since they could gain some benefit from doing GPU calculations without really having to do much extra coding.

I understand your point, and a nice example of such a library is jacket, a toolbox for matlab. I think however, that making such a library in C is a daunting task if you do not want to transfer all the data back and forth between CPU and GPU all the time.

I’ve played around with Jacket a bit (looks great!), but again, I’d like to make a ‘drop-in’ library for the everyday programmer. I’m going to focus on things that can parallelize well, like sorting, reductions, applying a certain function to every element in an array, etc.; hopefully the ones who will be using the library will at least understand when and where it can help them, instead of just slapping it on everything.

While not having too much time - I do use CUDA, to make a living by writing code for other people - I would be happy to contribute to such a project. My personal preference is Python, but I can write C and C++. There are already (semi)commercial offerings along the lines you suggest - gpuLib, Jacket - but open-source, freely available CUDA code sounds like a good idea.

I agree that having a nice templated reduction, sorting and other common algorithms would be great. It would complement the other often used elements (like BLAS or FFT).

the reduction example is completely templated nowadays as far as I remember.

I’m still working on the list, but I’m going to try to focus on three major areas to start:

  • Reduction Kernels

  • Numerical Analysis

  • Signal Processing

Maybe if I have a few minutes this weekend, I’ll get the project set up at SourceForge and stub out the basic interfaces. I would be grateful for anyone to help me with the above, or add your own specialty area to the list.

I would suggest making sure that each library call also has a non-cuda counterpart which is called if cuda is unavailable. Makes the whole thing a lot easier to distribute.

That’s the plan…later on down the road, I may also try to recruit someone to write code for that ahem other GPU vendor so that everyone is getting as much acceleration as possible. However, since I have an nVidia card, that’s what is getting support first.

I would suggest using the new --multicore target that is coming in 2.1 to utilize multicore CPU systems, then you only have to write CUDA code ;)

I’d be interested in contributing what I can.
I’m a mathematician, so I’d probably be most useful with the Numerical Analysis stuff.
Specifically I’d be interested in Differential Equations.

Let me know.

Well I for one would really like to use something like this. I like to play with image files in VB.net or VC++.net and CUDA would really speed things up BUT I am not a pro software developer. Just getting started seems pretty daunting to me. A really basic, handholding guide that tells me how to say, subtract two matrices using CUDA in VC++.NET would be AWESOME. You are right. If this tech were made a little more accessible, a LOT more people would use it. I’m going to follow this thread with anticipation. Thanks.

I am brand new to gpgpu and C++ programming.
I spend alot of time reading on these cuda forums.
Something like this would be great for people who dont have the math down.
I love math but i suck at algabra…i pretty good with right angle trig but i have never done an algoryhem in my life.
Anything that can help the noobs definately helps the gpgpu community alot.

Cant wait for some open algorythem libraries.

.NET is IMHO not a good idea. Keep it simple and stupid with C and write wrappers for .NET/C++ etc…

For every kernel I write I export a C++ interface (templates so it will work with double/float/int) and in case I ever need C I will write wrappers just like CUTIL does it…

Yeah, it’ll be written in C and I’ll provide a .NET wrapper. I do 99% of my development in C# these days, so I’ll need a wrapper for my own purposes anyway.

I’ve just gotten CUDA working on my development box (I was having some trouble with VC++ Express, so I just had to use VS2005), so hopefully I’ll be able to get working on this within the next week or so!

Yeap, I’d be interested in something like this - I already modified for my personal use the reduction sample so that it actually works for other that power-of-two arrays (at least the old one (1.1) didn’t) and so that it can be used to compute sum,min,max or minmax of an array of n-vectors (min and max are component-wise for n != 1, ie. max ( [1, 2], [3,1]) = [3, 2]).

But I’d also like something different that no one seems to be doing at the moment:

Kernel-side libraries.

I just have spent a huge amount of time writing complex number, small-matrix and small-vector routines that work entirely inside the kernels in order to have a sane development environment - for example just the usual 3x3-complex matrix multiply is surprisingly large amount of code lines and then of course you want all the different variants of those were either of the matrices is taken as the Hermitean adjoint matrix and so forth. It would had been extremely helpful to have ready implementations for complex numbers, 1,2,3,4-vectors both real and complex and optimized versions of all the utility-functions related to these - the library itself could be just a bunch of header files that implement these classes with kernel-side inline-functions.

Yeah, a sort of Boost for CUDA would be very cool.