Hi, all:
I want to improve the performance of code using stream, my idea is as follow, and I don’t know if it’s right. Can anybody help to check it? and any suggestions is appreciated:
There are one model calling 3 kernels run in the GPU in order, and another model transferring memory from host to device. These 2 models can be paralleld. So I want to bind the 3 kernels of one model to one stream(stream A ), and bind the memory transfer from host to device of another model to second stream(stream B ). This may overlap the kernel implementation and memory transfer.
Now, the condition is as follow:
Stream A ( model A )
kernel 1
kernel 2
kernel 3
Stream B ( model B )
memory transfer from host to device
( there are no relationship between model A and model B, and they can be paralled)
:rolleyes: