Hi , I’m trying to port my ml/dl algorithm into cuda c . My model is having weights of many layers . Cudamalloc and cudamemcpy for the weights of layer is taking more time , I want to avoid this cudamalloc and memcpy Since I already have weight values . How can I directly use my weights inside a cuda function / cuda kernel .
example weight of my model
float * prelu[1] = 0.0023
float *conv1d[512 * 64]= { all 512 *64 values };
For Now I’m managing like this :
cudaMalloc before the function call and inside function call I’m using cudamemcpy for weights.