I got an advice from this forum, I always appreciate you!:)
At this moment, I am trying to make a program which can solve huge matrix calculation.
But there is a little difference.
I want to divide one CUDA cu flow into three parts as below
- Copy host data into device
2) Calculate huge matrix
- Free device
Why I try to make like this is that it is borthersome procedure that copy host large data to device.
If I make a program as a single cu file, I have to copy, calculate, and free each execution.
I think it is really convenience that copying host data to device once.
It is done, I just execute calculation by changing some parameters concerning matrix size or indices.
Is it possible to make like this?
Thank you in advance.