I am new to cuda programming .I am using cublas to do some matrix operation.’ my main function is consisted with a for loop. In each iteration I need to get some value from host and assign it to an array in device memory . Since copying data host to device takes time , I thought of copying my host data array to device array first and then access each element .
N= 1000; // number of elements. float * d_in; // allocate GPU memory cudaMalloc((void **) &d_in, n*sizeof(float)); cudaMemcpy(d_in, h_in, n*sizeof(float) , cudaMemcpyHostToDevice);
so now this d_in will contain all the values. How can I access the specific element/elements in d_in array ?? (ex:- in first iteration I need 1-4 elements in d_in )
Actually I want use this elements to multiply with an another matrix using cublasSgemmBatched.( Z*X → z is 1-4 d_in values, X is predined matrix).