hi all,
how can i define the parameters from cpu-programm in kernel function to use crash?
example:
//definition of T1i directly in kernel: using it is very quickly.
global void test_cuda( float* offset_x, float* offset_y, float* A, int wA){
…
float T1i_00 = -0.999365, T1i_01 = -0.00383612, T1i_02 = 12.0158,
T1i_10 = 0.00396264, T1i_11 = -0.99999, T1i_12 = -10.7236,
T1i_20 = -4.73512e-006, T1i_21 = -3.76513e-007, T1i_22 = 1.00069;
…
}
//Definition/calculation of T1i in the cpu programm and using in kernel: very slowly (about 10 times).
float* T1i;
…
global void test_cuda_gT( float* offset_x, float* offset_y, float* A, float* T1, int wA){
…
float T1i_00 = T1[0], T1i_01 = T1[1], T1i_02 = T1[2],
T1i_10 = T1[3], T1i_11 = T1[4], T1i_12 = T1[5],
T1i_20 = T1[6], T1i_21 = T1[7], T1i_22 = T1[8];
…
}
I know the accessing to device memory is unefficient. But how can i define this in CPU calculated paramerters in kernel, in order to get the same perfromance as the direct definition in kernel?
best regards!