an easy-to-use CUDA library

lumpy_zhu · January 12, 2012, 5:10am

I have wrote some code, to make CUDA easy to use,

and I want to know, is there any one interested my project??

eg.

__device__ void init_xy(vfloat x, vfloat y, int i){

    x(i) = i;

    y(i) = i*i;

}

int main() {

    const int N = 1024;

    device::vfloat    x(N), y(N);            //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.

// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",

    // it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]

    cuda::foreach(init_xy, x, y)(N);         

float a = 2.0;

// like saxpy in cublas, and it's as fast as saxpy

    y = a*x+y;

// host::vfloat is short for host::vector<float>,  y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,

    // it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"

    host::vfloat hx = y[0|_|10];

}

and there are more interesting things in my project.

lumpy_zhu · January 12, 2012, 5:10am

I have wrote some code, to make CUDA easy to use,

and I want to know, is there any one interested my project??

eg.

__device__ void init_xy(vfloat x, vfloat y, int i){

    x(i) = i;

    y(i) = i*i;

}

int main() {

    const int N = 1024;

    device::vfloat    x(N), y(N);            //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.

// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",

    // it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]

    cuda::foreach(init_xy, x, y)(N);         

float a = 2.0;

// like saxpy in cublas, and it's as fast as saxpy

    y = a*x+y;

// host::vfloat is short for host::vector<float>,  y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,

    // it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"

    host::vfloat hx = y[0|_|10];

}

and there are more interesting things in my project.

Sarnath · January 12, 2012, 5:51am

Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.

Sarnath · January 12, 2012, 5:51am

Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.

lumpy_zhu · January 12, 2012, 5:59am

axpy in thrust:

#include <cmath>

template <typename T>

struct axpy

{

  T a;

axpy(T a) : a(a) {}

__host__ __device__

  T operator()(T x, T y) const

  {

    return a * x + y;

  }

};

template <typename Vector>

void axpy_fast(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

  thrust::transform(x.begin(), x.end(), y.begin(), y.begin(), axpy<T>(a));

}

template <typename Vector>

void axpy_slow(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

// temp <- a

  Vector temp(x.size(), a);

// temp <- a * x

  thrust::transform(x.begin(), x.end(), temp.begin(), temp.begin(), thrust::multiplies<float>());

// y <- a * x + y

  thrust::transform(temp.begin(), temp.end(), y.begin(), y.begin(), thrust::plus<float>());

}

axpy at my project:

template <typename T>

void axpy_fast(const T& a, vector<T>x, vector<T> x){

  y = a*x+y;

}

lumpy_zhu · January 12, 2012, 5:59am

axpy in thrust:

#include <cmath>

template <typename T>

struct axpy

{

  T a;

axpy(T a) : a(a) {}

__host__ __device__

  T operator()(T x, T y) const

  {

    return a * x + y;

  }

};

template <typename Vector>

void axpy_fast(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

  thrust::transform(x.begin(), x.end(), y.begin(), y.begin(), axpy<T>(a));

}

template <typename Vector>

void axpy_slow(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

// temp <- a

  Vector temp(x.size(), a);

// temp <- a * x

  thrust::transform(x.begin(), x.end(), temp.begin(), temp.begin(), thrust::multiplies<float>());

// y <- a * x + y

  thrust::transform(temp.begin(), temp.end(), y.begin(), y.begin(), thrust::plus<float>());

}

axpy at my project:

template <typename T>

void axpy_fast(const T& a, vector<T>x, vector<T> x){

  y = a*x+y;

}

lumpy_zhu · January 12, 2012, 6:06am

thanks for you reply.

I think jacket is some slow…

here is a matrix example.

const int NA=1024;

const int NR=1024;

device::mfloat m(NA, NR);

device::vfloat v(NA);

for( int i = 0; i < NR; ++i)  m[_, i] = v * i;

lumpy_zhu · January 12, 2012, 6:06am

thanks for you reply.

I think jacket is some slow…

here is a matrix example.

const int NA=1024;

const int NR=1024;

device::mfloat m(NA, NR);

device::vfloat v(NA);

for( int i = 0; i < NR; ++i)  m[_, i] = v * i;

Sarnath · January 12, 2012, 6:08am

No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.

Sarnath · January 12, 2012, 6:08am

No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.

Sarnath · January 12, 2012, 6:09am

Lumpy, does you signature suggest that only Fools can use your tool?? :-)
Well, I would like to stay away :-)

Sarnath · January 12, 2012, 6:09am

Lumpy, does you signature suggest that only Fools can use your tool?? :-)
Well, I would like to stay away :-)

lumpy_zhu · January 12, 2012, 6:11am

well , I only means GNU Auto Tools,

why not SCons or waf?

Er, I’d like to make it open-source~~but, it’s under development…

lumpy_zhu · January 12, 2012, 6:11am

well , I only means GNU Auto Tools,

why not SCons or waf?

Er, I’d like to make it open-source~~but, it’s under development…

JaredHoberock · January 12, 2012, 7:12am

I’m interested. I assume this is your repo?

Looks like you know what you’re doing… keep up the good work!

JaredHoberock · January 12, 2012, 7:12am

I’m interested. I assume this is your repo?

Looks like you know what you’re doing… keep up the good work!

lumpy_zhu · January 12, 2012, 7:21am

Er…google is powerfull…

I’m glade to hear someone interested,

If there are more guys like it, I’ll be happy working on it.

lumpy_zhu · January 12, 2012, 7:21am

Er…google is powerfull…

I’m glade to hear someone interested,

If there are more guys like it, I’ll be happy working on it.

MarkusM · January 13, 2012, 11:06am

Actually you can also easily build this sort of thing on top of thrust. Simply wrap begin+end iterators in a range and you can define all transformations and operations on it. Using fancy iterators provides the lazy evaluation and you can simply transplant all existing thrust algorithms to them.

Something like:

DeviceVector<int> a = sequence(0, n); // Uses counting_iterator.

DeviceVector<int> b = n - sequence(0, n);

DeviceVector<int> c = a + b; // Short for zip(a, b).transform(wrap(thrust::plus<int>())).

int sum = reduce(a * b + c);

bool comp = all(c == n);

But I’m unsure, how well the compiler actually handles this level of abstraction.

This is definitely an interesting topic. I’d also appreciate your work.

MarkusM · January 13, 2012, 11:06am