Acessing specific elements of an array in device memory

I am new to cuda programming .I am using cublas to do some matrix operation.’ my main function is consisted with a for loop. In each iteration I need to get some value from host and assign it to an array in device memory . Since copying data host to device takes time , I thought of copying my host data array to device array first and then access each element .

N= 1000; // number of elements.

float * d_in;

// allocate GPU memory
cudaMalloc((void **) &d_in, n*sizeof(float));

cudaMemcpy(d_in, h_in, n*sizeof(float) , cudaMemcpyHostToDevice);

so now this d_in will contain all the values. How can I access the specific element/elements in d_in array ?? (ex:- in first iteration I need 1-4 elements in d_in )

Actually I want use this elements to multiply with an another matrix using cublasSgemmBatched.( Z*X → z is 1-4 d_in values, X is predined matrix).

Cuda Version-6.5

i suppose the easiest answer is a question: what kind of kernel dimensions do you have in mind?