I have the following kernel

```
template <int type>
void __global__ vecop_kernel(float * out, const float * lhs, const float * rhs, int len, const bool *idx)
{
const int off = threadIdx.x + blockDim.x * blockIdx.x;
const int skip = gridDim.x * blockDim.x;
for (int i=off; i<len; i+=skip)
{
if (type==0) idx[i] = (lhs[i]>rhs[i]);
if (type==1) idx[i] = (lhs[i]<rhs[i]);
if (type==2) idx[i] = (lhs[i]==rhs[i]);
}
}
```

Which essentially computes

idx = lhs>rhs

idx = lhs<rhs

or

idx = lhs==rhs

idx,lhs,rhs are all vectors.

Would there be a more efficient way to code the kernel?

Cheers.