Device function inside kernel

Hello. I have this kernel in CUDA:

global void my_kernel(int* ch_d, int* ch_s, float* th, int* feat, float* val, float* xt, int ncl, int* count, int n_est, int el, int nb, float* Z) {
int e;
e = blockIdx.x * blockDim.x + threadIdx.x;
float z0 = 0;
z0 += my_func(ch_d, ch_s, th, feat, val, xt, ncl, count, e * ncl, el, nb);
Z[0] = z0;
}

my_func is declared as a device function in order to be used inside a kernel and it returns a float value. Z is an array of 5 elements. My problem is when I print Z, the value of Z[0] is not updated but is always 0. Do you know a possible reason of this?