and I want to know, is there any one interested my project??
eg.
__device__ void init_xy(vfloat x, vfloat y, int i){
x(i) = i;
y(i) = i*i;
}
int main() {
const int N = 1024;
device::vfloat x(N), y(N); //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.
// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",
// it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]
cuda::foreach(init_xy, x, y)(N);
float a = 2.0;
// like saxpy in cublas, and it's as fast as saxpy
y = a*x+y;
// host::vfloat is short for host::vector<float>, y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,
// it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"
host::vfloat hx = y[0|_|10];
}
and there are more interesting things in my project.
and I want to know, is there any one interested my project??
eg.
__device__ void init_xy(vfloat x, vfloat y, int i){
x(i) = i;
y(i) = i*i;
}
int main() {
const int N = 1024;
device::vfloat x(N), y(N); //vfloat is short for vector<float>, create 2 vectors in device memory, each size is N.
// the syntax is "cuda::foreach(funcname, args...)(parall_dims...)",
// it will call function "__device__ init_xy(vfloat x, vfloat y, int i)", ant i will get [0...N-1]
cuda::foreach(init_xy, x, y)(N);
float a = 2.0;
// like saxpy in cublas, and it's as fast as saxpy
y = a*x+y;
// host::vfloat is short for host::vector<float>, y[0|_|10] is like y[0:10] in matlab, gets first 10 elements in y,
// it's as fast as "cudaMemcpy(y, x, 10*sizeof(float)"
host::vfloat hx = y[0|_|10];
}
and there are more interesting things in my project.
Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.
Sounds more like Thrust… The operator overloading thing looks to be good. Not sure if Thrust supports such a thing… This infact reminds me of Jacket CUDA - who did a similar thing for MATLAB long back.
Anyway,
Good thinking and right way to approach programming! Congrats!
Release it in the net and you will see how people like it.
No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.
No doubt, Thrust looks very complicated! Good work!
I want to see what Thrust guys (Jared Ha…ck (dont remember), for instance) say about it.
They would know it better than I.
Actually you can also easily build this sort of thing on top of thrust. Simply wrap begin+end iterators in a range and you can define all transformations and operations on it. Using fancy iterators provides the lazy evaluation and you can simply transplant all existing thrust algorithms to them.
Something like:
DeviceVector<int> a = sequence(0, n); // Uses counting_iterator.
DeviceVector<int> b = n - sequence(0, n);
DeviceVector<int> c = a + b; // Short for zip(a, b).transform(wrap(thrust::plus<int>())).
int sum = reduce(a * b + c);
bool comp = all(c == n);
But I’m unsure, how well the compiler actually handles this level of abstraction.
This is definitely an interesting topic. I’d also appreciate your work.
Actually you can also easily build this sort of thing on top of thrust. Simply wrap begin+end iterators in a range and you can define all transformations and operations on it. Using fancy iterators provides the lazy evaluation and you can simply transplant all existing thrust algorithms to them.
Something like:
DeviceVector<int> a = sequence(0, n); // Uses counting_iterator.
DeviceVector<int> b = n - sequence(0, n);
DeviceVector<int> c = a + b; // Short for zip(a, b).transform(wrap(thrust::plus<int>())).
int sum = reduce(a * b + c);
bool comp = all(c == n);
But I’m unsure, how well the compiler actually handles this level of abstraction.
This is definitely an interesting topic. I’d also appreciate your work.