How to avoid cudamalloc and cudamemcpy for the weights of my algorithm?

Hi , I’m trying to port my ml/dl algorithm into cuda c . My model is having weights of many layers . Cudamalloc and cudamemcpy for the weights of layer is taking more time , I want to avoid this cudamalloc and memcpy Since I already have weight values . How can I directly use my weights inside a cuda function / cuda kernel .

example weight of my model
float * prelu[1] = 0.0023
float *conv1d[512 * 64]= { all 512 *64 values };

For Now I’m managing like this :
cudaMalloc before the function call and inside function call I’m using cudamemcpy for weights.

Your provided code doesn’t make any sense. But that is besides the point.

If you have data in CPU memory (weights) that you would like to use in a cuda kernel, you will need to transfer that data to the GPU. Although there are several ways to do this, there is no magic here. The cudamalloc/cudamemcpy approach is typical, reasonable, and essentially unavoidable.

Thank you so much for your reply.

yes, have data in CPU memory (weights) that I want to use in a cuda kernel, My doubt here is can’t we directly assign weight in gpu memory ?