an easy-to-use CUDA library

I have wrote some code, to make CUDA easy to use,

and I want to know, is there any one interested my project??

eg.

__device__ void init_xy(vfloat x, vfloat y, int i){

    x(i) = i;

    y(i) = i*i;

}

int main() {

    const int N = 1024;

    device::vfloat    x(N), y(N);            //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.

// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",

    // it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]

    cuda::foreach(init_xy, x, y)(N);         

float a = 2.0;

// like saxpy in cublas, and it's as fast as saxpy

    y = a*x+y;

// host::vfloat is short for host::vector<float>,  y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,

    // it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"

    host::vfloat hx = y[0|_|10];

}

and there are more interesting things in my project.

I have wrote some code, to make CUDA easy to use,

and I want to know, is there any one interested my project??

eg.

__device__ void init_xy(vfloat x, vfloat y, int i){

    x(i) = i;

    y(i) = i*i;

}

int main() {

    const int N = 1024;

    device::vfloat    x(N), y(N);            //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.

// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",

    // it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]

    cuda::foreach(init_xy, x, y)(N);         

float a = 2.0;

// like saxpy in cublas, and it's as fast as saxpy

    y = a*x+y;

// host::vfloat is short for host::vector<float>,  y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,

    // it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"

    host::vfloat hx = y[0|_|10];

}

and there are more interesting things in my project.

Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.

Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.

axpy in thrust:

#include <cmath>

template <typename T>

struct axpy

{

  T a;

axpy(T a) : a(a) {}

__host__ __device__

  T operator()(T x, T y) const

  {

    return a * x + y;

  }

};

template <typename Vector>

void axpy_fast(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

  thrust::transform(x.begin(), x.end(), y.begin(), y.begin(), axpy<T>(a));

}

template <typename Vector>

void axpy_slow(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

// temp <- a

  Vector temp(x.size(), a);

// temp <- a * x

  thrust::transform(x.begin(), x.end(), temp.begin(), temp.begin(), thrust::multiplies<float>());

// y <- a * x + y

  thrust::transform(temp.begin(), temp.end(), y.begin(), y.begin(), thrust::plus<float>());

}

axpy at my project:

template <typename T>

void axpy_fast(const T& a, vector<T>x, vector<T> x){

  y = a*x+y;

}

axpy in thrust:

#include <cmath>

template <typename T>

struct axpy

{

  T a;

axpy(T a) : a(a) {}

__host__ __device__

  T operator()(T x, T y) const

  {

    return a * x + y;

  }

};

template <typename Vector>

void axpy_fast(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

  thrust::transform(x.begin(), x.end(), y.begin(), y.begin(), axpy<T>(a));

}

template <typename Vector>

void axpy_slow(const typename Vector::value_type a, const Vector& x, Vector& y)

{

  typedef typename Vector::value_type T;

// temp <- a

  Vector temp(x.size(), a);

// temp <- a * x

  thrust::transform(x.begin(), x.end(), temp.begin(), temp.begin(), thrust::multiplies<float>());

// y <- a * x + y

  thrust::transform(temp.begin(), temp.end(), y.begin(), y.begin(), thrust::plus<float>());

}

axpy at my project:

template <typename T>

void axpy_fast(const T& a, vector<T>x, vector<T> x){

  y = a*x+y;

}

thanks for you reply.

I think jacket is some slow…

here is a matrix example.

const int NA=1024;

const int NR=1024;

device::mfloat m(NA, NR);

device::vfloat v(NA);

for( int i = 0; i < NR; ++i)  m[_, i] = v * i;

thanks for you reply.

I think jacket is some slow…

here is a matrix example.

const int NA=1024;

const int NR=1024;

device::mfloat m(NA, NR);

device::vfloat v(NA);

for( int i = 0; i < NR; ++i)  m[_, i] = v * i;

No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.

No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.

Lumpy, does you signature suggest that only Fools can use your tool?? :-)
Well, I would like to stay away :-)

Lumpy, does you signature suggest that only Fools can use your tool?? :-)
Well, I would like to stay away :-)

well , I only means GNU Auto Tools,

why not SCons or waf?

Er, I’d like to make it open-source~~but, it’s under development…

well , I only means GNU Auto Tools,

why not SCons or waf?

Er, I’d like to make it open-source~~but, it’s under development…

I’m interested. I assume this is your repo?

Looks like you know what you’re doing… keep up the good work!

I’m interested. I assume this is your repo?

Looks like you know what you’re doing… keep up the good work!

Er…google is powerfull…

I’m glade to hear someone interested,

If there are more guys like it, I’ll be happy working on it.

Er…google is powerfull…

I’m glade to hear someone interested,

If there are more guys like it, I’ll be happy working on it.

Actually you can also easily build this sort of thing on top of thrust. Simply wrap begin+end iterators in a range and you can define all transformations and operations on it. Using fancy iterators provides the lazy evaluation and you can simply transplant all existing thrust algorithms to them.

Something like:

DeviceVector<int> a = sequence(0, n); // Uses counting_iterator.

DeviceVector<int> b = n - sequence(0, n);

DeviceVector<int> c = a + b; // Short for zip(a, b).transform(wrap(thrust::plus<int>())).

int sum = reduce(a * b + c);

bool comp = all(c == n);

But I’m unsure, how well the compiler actually handles this level of abstraction.

This is definitely an interesting topic. I’d also appreciate your work.

Actually you can also easily build this sort of thing on top of thrust. Simply wrap begin+end iterators in a range and you can define all transformations and operations on it. Using fancy iterators provides the lazy evaluation and you can simply transplant all existing thrust algorithms to them.

Something like:

DeviceVector<int> a = sequence(0, n); // Uses counting_iterator.

DeviceVector<int> b = n - sequence(0, n);

DeviceVector<int> c = a + b; // Short for zip(a, b).transform(wrap(thrust::plus<int>())).

int sum = reduce(a * b + c);

bool comp = all(c == n);

But I’m unsure, how well the compiler actually handles this level of abstraction.

This is definitely an interesting topic. I’d also appreciate your work.