Definition of CPU calculated parameters in Kernel fuction

hi all,

how can i define the parameters from cpu-programm in kernel function to use crash?

example:

//definition of T1i directly in kernel: using it is very quickly.
global void test_cuda( float* offset_x, float* offset_y, float* A, int wA){

float T1i_00 = -0.999365, T1i_01 = -0.00383612, T1i_02 = 12.0158,
T1i_10 = 0.00396264, T1i_11 = -0.99999, T1i_12 = -10.7236,
T1i_20 = -4.73512e-006, T1i_21 = -3.76513e-007, T1i_22 = 1.00069;

}


//Definition/calculation of T1i in the cpu programm and using in kernel: very slowly (about 10 times).
float* T1i;

global void test_cuda_gT( float* offset_x, float* offset_y, float* A, float* T1, int wA){

float T1i_00 = T1[0], T1i_01 = T1[1], T1i_02 = T1[2],
T1i_10 = T1[3], T1i_11 = T1[4], T1i_12 = T1[5],
T1i_20 = T1[6], T1i_21 = T1[7], T1i_22 = T1[8];

}

I know the accessing to device memory is unefficient. But how can i define this in CPU calculated paramerters in kernel, in order to get the same perfromance as the direct definition in kernel?

best regards!

now i know, why is the second test programm so slowly. :turned:

it seems the pointer can only point to the global memory.

I dont define the in cpu generated parameter as pointer now, but rather as some floats.