Is it possible to divide one CUDA code into some parts of different CUDA codes?

Hi there,
I got an advice from this forum, I always appreciate you!:)

At this moment, I am trying to make a program which can solve huge matrix calculation.
But there is a little difference.
I want to divide one CUDA cu flow into three parts as below

  1. Copy host data into device

While loop
2) Calculate huge matrix
End

  1. Free device

Why I try to make like this is that it is borthersome procedure that copy host large data to device.
If I make a program as a single cu file, I have to copy, calculate, and free each execution.
I think it is really convenience that copying host data to device once.
It is done, I just execute calculation by changing some parameters concerning matrix size or indices.
Is it possible to make like this?

Thank you in advance.

Sincerely,
Albert

If you are meaning to call a kernel many times from inside that while loop then this is a very common approach. Sample code in the SDK e.g. on particles or n-body

dear kbam,

Thank you for a comment.
I am going to find it from sample code! thank you!

Sincerely,
Albert

Dear kbam,

Good morning!
I have searched examples from CUDA sample browser.
Yeah, the structure of nBody is what I want to make. :)
Result is affected with respect to changing variables.

I will try to make the same structure of nBody.
Thank you!

Sincerely,
Albert