Hello,
i’m trying to optimize my program with CUDA. A lot of Eigen structure is used in the program and only parts should run on the GPU. Some calculations with a Matrix3Xf are colwise and i have a problem to implement it on a GPU with additional other array operations.
__global__ void test(float * x, float* y, float param, int colSize){ // x has the data of Matrix3Xf
int id = blockIdx.x * blockDim.x + threadIdx.x;
if ((id % 3 == 0) && (id < colSize * 3)) // stay within the limits of x
{
// Eigen::Matrix3Xf x = x.colwise().normalized();
float norm = sqrt(x[id] * x[id] + x[id + 1] * x[id + 1] + x[id + 2] * x[id + 2]);
x[id] = x[id] * 1 / norm;
x[id] = x[id] * 1 / norm;
x[id+ 1] = x[id + 1] * 1 / norm;
x[id+ 2] = x[id + 2] * 1 / norm;
float var = y[id] * param;
//do something with var and x
...
}
}
If i only allow (id % 3 == 0), i don’t have access to all parts of y. When i try to use float** x, i have a problem to transfer the Matrix3Xf data from host memory to device memory. Any ideas?
My environment is Visual Studio 2017 and CUDA 9.2.
Thanks in advance.