hello, i’m using cublas and i wonder how to control separate vaiables.
for example, i have a matrix on device and how can i compare or make change of certain elements ?
thx :)
hi,
the first thing you have to do is to define a grid and the number of threads in each cell of the grid.
That’s nicely explained in the manual. But perhaps it becomes more clear with a little example
if you want to multiply each element in a matrix with one filter element (which is stored in a filter matrix with a size equal to the size of your input matrix), you could do this with the following code…
__global__ void Apply_Filter_Kernel (float *data, float * filter, int width) {
int offset = blockIdx.x * width + threadIdx.x;
data[offset] *= filter [offset];
}
extern "C" void Apply_Filter (float*data, float*filter, int width, int height) {
Apply_Filter_Kernel<<<height, width>>> (data, filter, width, height);
}
the output would be:
d1f1 d2f2 d3*f3
d4f4 d5f5 d6*f6, with di: matrix elements, fi filter elements
d7f7 d8f8 d9*f9
width and height should be multiple of the warp size (32) to achieve optimal performance.
In this example the grid is a 1D-grid of size height. Each cell in the grid contains width elements.
Now the offset of each element is just: blockIdx.x * width + threadIdx.x
The code above makes each thread compute one element.
cheers,
xlro
Thank you very much but i’m still not clear how to change the value of a certain element in a matrix especially in branches.
suppose DA is a matrix on device and a,b,c are elements in DA.
how can i do to carry out this kind of statement:
if(a>b)
c=0;
else c=1;
THX
LuY. External Image
This is something you should better do with cpu for a single value.
If you want to do this on GPU you need CUDA. (im also using both cuda and cublas).
In cuda you could do this on a grid with size 1 and 1 thread per cell.
__global__ void Condition_Kernel(float *data, int aOffset, int bOffset, int cOffset) {
if( data[aOffset] > data[bOffset])
data[cOffset] = 0;
else
data[cOffset] = 1;
}
extern "C" void Condition (float *data, int aOffset, int bOffset, int cOffset) {
Condition_Kernel<<<1, 1>>> (aPos, aOffset, bOffset, cOffset);
}
YesYes i’m transforming a high-speed modem application into GPU form and the variables are so orderless that i cannot easily deal them on the GPU . But thank you all the same and i wonder if there are some simple ways to carry out or else hundreds of such simple switches and judgments will tear me.
Does it necessary to setup individual functions for each operation?
Sorry for my poor English because i’m just a freshman in Non-English speaking country. :P
if i understand you correctly, then i’d advice you to reorder the variables, because you should be able to treat them in parrallel. Otherwise the CPU will be faster than the GPU.
If you reorder them well, you can manage all of them with 1 or 2 function calls.