I am new to cuda programming .I am using cublas to do some matrix operation.’ my main function is consisted with a for loop. In each iteration I need to get some value from host and assign it to an array in device memory . Since copying data host to device takes time , I thought of copying my host data array to device array first and then access each element .
N= 1000; // number of elements.
float * d_in;
// allocate GPU memory
cudaMalloc((void **) &d_in, n*sizeof(float));
cudaMemcpy(d_in, h_in, n*sizeof(float) , cudaMemcpyHostToDevice);
so now this d_in will contain all the values. How can I access the specific element/elements in d_in array ?? (ex:- in first iteration I need 1-4 elements in d_in )
Actually I want use this elements to multiply with an another matrix using cublasSgemmBatched.( Z*X → z is 1-4 d_in values, X is predined matrix).
Cuda Version-6.5